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Abstract 

Recent developments in the held of digital design and hardware verification have 
found great use for restricted forms of branching programs. In particular, oblivious 
read-once branching programs (also called "OBDD's") are central to a very common 
technique for verifying circuits. These programs are useful because they are easily 
manipulated and compared for equivalence. However, their utility is limited because 
they cannot compute in polynomial size several simple functions — most notably, integer 
multiplication. This limitation has prompted the consideration of alternative models, 
usually restricted classes of branching programs, in the hope of finding one with greater 
computational power but also easily manipulated and tested for equivalence. 

Read-once (non-oblivious) branching programs can to some degree be manipulated 
and tested for equivalence, but it has been an open question whether they can compute 
integer multiplication in polynomial size. The main result of this thesis proves that 
they cannot — multiplication requires size 2™v^'. This is the hrst lower bound for 
multiplication on non-oblivious branching programs. By defining the appropriate kind 
of problem reduction, which we call read-once reductions, we are able to show that our 
result implies the same asymptotic lower bound for other arithmetic functions. 

We also survey known results about the various alternative models, describing the 
main techniques used for thinking about their computation and for proving lower 
bounds. These techniques are illustrated with two proofs that have not appeared 
in the literature. We summarize the known results by taking a structural approach of 
comparing the complexity classes corresponding to the various models. 
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Chapter 1 
Introduction 



Branching programs have recently been found very useful in the held of hardware 
verihcation. The central problem of verihcation is to check whether a combinational 
hardware circuit has been correctly designed. One approach commonly employed today 
is to convert independently the circuit description and the function specification to a 
common intermediate representation and then test whether the two representations are 
equivalent (e.g., [Br92, We94]). The use of restricted forms of branching programs for 
the intermediate representation has made this approach feasible and very popular — 
several software packages are available for implementing this very strategy [Kr94, Br92]. 
This application raises several issues of computational complexity, renewing interest 
in the low-level complexity of branching programs. This thesis explores some of these 
issues from a computational complexity-theoretic point of view. 

1.1 The role of branching programs in hardware verification 

Most of the computational models considered as candidates for the intermediate rep- 
resentation are restricted classes of branching programs. A branching program is a 
directed acyclic graph with a distinguished root node and two sink nodes. The sink 
nodes are labeled and f and each non-sink node is labeled with an input variable x^ 
« £ [n], and has two outgoing edges, labeled and f. A branching program computes 
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a Boolean function / : {0,l} n — > {0,1} in the natural manner: each assignment of 
Boolean values to the variables X{ defines a unique path through the graph from the 
root to one of the sinks; the label of that sink defines the value of the function on 
that input. The size of a branching program is its number of nodes. Since branching 
programs are a non-uniform model of computation, asymptotic statements about size 
refer to families of branching programs containing one program for each input size. 

The circuit to be verified is assumed to be an ordinary combinational single-output 
circuit, built up from a standard basis of Boolean functions such as {A, V, - '}. The 
typical algorithm for constructing the intermediate representation from the circuit is 
to work bottom-up through the circuit, from the inputs to the output, combining the 
representations appropriately at each gate. Thus, the algorithm need only compute a 
representation for / A g, f V g, and ->/, when given representations for / and g. In the 
literature, these are called the "synthesis operations". It is easy to see that arbitrary 
polynomial-size branching programs are closed under these operations. 

This strategy for verification has several shortcomings that are immediately ap- 
parent. First, unrestricted polynomial-size branching programs compute exactly those 
functions in non-uniform logspace. Therefore, if the intermediate representation is a 
restricted form of branching program, we clearly cannot hope for a general algorithm 
to compute a polynomial-size representation (polynomial in the size of the original cir- 
cuit) unless L/poly = P / 'poly. This difficulty has largely been accepted as inherent and 
not critical, since functions computed at level of hardware are not generally complex 
and are in fact in L anyway. A second observation is that efficient algorithms for the 
individual synthesis operations do not imply that the resulting bottom-up algorithm for 
computing a representation is efficient: for example, if the output of each operation has 
size that is the product of the input representations, the final representation will have 
size exponential in the size of the original circuit. Despite this problem, researchers 
have been content with the bottom-up algorithm as long as each synthesis operation 
can be performed efficiently. 

Finally, there is the problem of testing whether the two branching programs, cor- 
responding to the circuit and the specification, are equivalent. It is easy to see that 
this problem is co-NP-complete: Given a 3-CNF with variables {x-±, . . . ,x n }, we may 
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to construct a branching program on the same variables which accepts exactly when 
the formula is satisfied. A polynomial-time algorithm for equivalence then clearly gives 
a polynomial-time algorithm for 3-SAT by comparing this program with the trivial 
branching program that always rejects; to say they are not equivalent is to say the 
formula is satishable. 

1.2 Restricted branching programs 

Because of the difficulty of comparing arbitrary branching programs for equivalence, 
the intermediate representation is instead chosen to be a restricted class of branching 
programs. These are oblivious read-once branching programs, or OBDD's ("ordered 
binary decision diagrams"). 

Definition 1 A branching program is read-once if on every path from the source to a 
sink, each variable appears at most once as the label of a vertex. 

Definition 2 A branching program is oblivious if on every path from the source to a 
sink, the variables appear in the same order. 

Our definition of oblivious is slightly different from the usual definition, which requires 
the branching program to be leveled (for each node, all paths from the sink have the 
same length) with each node at a given level labeled with the same variable. Our 
definition does not require leveling; it is easy to see that any oblivious program may 
be leveled at a cost in size of a factor of n, the number of variables. Since we will 
primarily be concerned with polynomial versus exponential growth, we will or will not 
assume leveled programs as convenient. 

Thus, OBDD's may be thought of as non-uniform acyclic finite-state automata. No- 
tice that the read-once property implies that an OBDD is satishable exactly when there 
exists a path from the source to the accepting sink — since no variable appears more 
than once on any path, there is a consistent assignment to the variables corresponding 
to that path. An OBDD for ->f is trivially constructed by exchanging the accepting 
and rejecting sinks. Given two OBDD's for / and g that obey the same ordering of the 
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variables, an OBDD for / A g or / V g is easily constructed using the standard product 
constructions for finite automata. (This last statement is not true if the two OBDD's 
do not obey the same ordering — see Section 2.2.1.) It follows that two OBDD's are 
easily tested for equivalence by testing their exclusive-or for satisfiability. 

Because of the tractability of these operations on OBDD's, they have been the in- 
termediate representation of choice. However, OBDD's are clearly a very weak model 
of computation, and the question arises whether they are sufficiently powerful to meet 
the needs at hand. The answer is yes, for the most part — OBDD's can compute in 
polynomial size such functions as integer addition, symmetric Boolean functions, and 
many of the benchmark functions used by the verification community [BF85] — but with 
a very important exception: exponential size is required to compute integer multiplica- 
tion [Br91]. This is an serious setback to the viability of OBDD's, since the hardware 
to be tested typically contains circuits that perform multiplication. Today, the largest 
multipliers that can be checked using this method have 12-bit inputs; ideally, circuit 
designers would like to check multipliers of 32 or even 64 bits. 

Thus, despite the success of this approach, there has also been great effort expended 
to find another model that is likewise manipulated, but with greater computational 
power [SDG94, SW95, e.g.]. Most of these models— ^-OBDD's, &-IBDD's, nondeter- 
ministic OBDD's — have proven too weak to compute multiplication in polynomial size 
(see Chapter 2). A common feature of these models is that they are all oblivious 
branching programs. It is therefore natural to consider non-oblivious programs, the 
simplest of these being read-once programs. 

Unfortunately, read-once programs do not enjoy quite the same degree of manip- 
ulability as OBDD's. Determining whether a read-once program is satishable is as 
simple as for an OBDD, since the read-once property implies that the program is sat- 
ishable exactly when there is a path from the source to the accepting sink. Also, 
testing equivalence is reasonably tractable: although it is not known how to do so in 
deterministic polynomial time, there is a randomized polynomial-time algorithm with 
one-sided error due to Blum, Chandra, and Wegman [BCW80]. The synthesis opera- 
tions, however, are provably not tractable: there exist functions / and g that each have 
polynomial-size read-once programs but whose conjunction / A g requires exponential- 



§1.2 Restricted branching programs 13 

size read-once programs. Despite their relative recalcitrance, read-once programs have 
been considered by some researchers for possible use in hardware verification [GM94]. 
Until now, however, very little was known about the complexity of multiplication with 
any non-oblivious programs. 

In this thesis, we prove that multiplication requires (non-oblivious) read-once branch- 
ing programs of size 2™v^' . This is the first superpolynomial lower bound for multi- 
plication on non-oblivious branching programs. This result demonstrates that relaxing 
the ordering restriction of OBDD's is insufficient to gain the desired computational 
power, and thus further strengthening of the model is needed. By defining the ap- 
propriate kind of problem reduction, which we call read-once reductions, we are able 
to show that our result implies the same asymptotic lower bound for other arithmetic 
functions. 

Chapter 2 considers in some detail the other models, all essentially generalizations 
of OBDD's. In addition to summarizing the lower bounds are known for functions in the 
various models, we compare the classes of functions that are computable in polynomial 
size by the models, and also describe the techniques available for proving lower bounds 
in the different models. Included are two simple proofs that have not appeared in 
the literature. Chapter 3 gives the lower bound for multiplication and the problem 
reductions; Chapter 4 concludes with statements of the interesting open problems. 



14 Introduction 



Chapter 2 
Related models 



In the search for alternatives to OBDD's, many models have been considered. In 
addition to their relevance for hardware verification, they are interesting also for the 
questions of structural complexity that they raise. 

This chapter begins by summarizing the various extensions to OBDD's and read- 
once programs, including adding nondeterminism and allowing variables to be read k 
times. These different models are compared in two respects: (I) the ease with which 
such programs are manipulated, and (2) their computational power. 

Section 2.3 summarizes the known lower bounds. We then take a structural view of 
the relationships between the classes of functions computable in polynomial size for the 
various models. We will see that the two restrictions obliviousness and restricted reading 
are orthogonal to each other: With respect to polynomial size, there are functions 
that can be computed with read-once programs but cannot be computed by oblivious 
read-&-times programs for any constant k; yet at the same time, there are functions 
computable by oblivious read-&-times programs that cannot be computed by (non- 
oblivious) read-once programs. We will also consider the hierarchies with respect to k 
in the various models. 

Section 2.5 briefly outlines the primary techniques for proving lower bounds, in- 
cluding two proofs that have not appeared in the literature. In Section 2.6 we discuss 
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the problem of integer multiplication and describe the known lower bounds. Finally, 
in Section 2.7 we mention two related issues. 

2.1 Definitions 

We begin with the definitions of the various extensions to the basic models. Recall that 
in a read-once branching program, each variable appears at most once on every path 
from the source to a sink; an OBDD is an oblivious read-once branching program — each 
path through the program inspects the variables in the same order, each at most once. 

Two recently proposed models, which we shall not consider here, are "graph-driven 
BDD's" [SW95] and "binary moment diagrams" [BC94]. The latter are not branching 
programs, and do not compute a function, but they do allow polynomial-size represen- 
tation of multiplication. Also, in [S95] lower bounds are proved on branching programs 
in which for each path, the number of variables appearing more than once is bounded 
by k. In [MW95], lower bounds are proved for nondeterministic programs in which 
each path obeys a bound on the number of alternations between sets of variables. 

2.1.1 Reading each variable k times 

There are essentially three models of branching programs in which each variable may 
be read multiple times: 

1. £;-OBDD's (also known as &-BDD's [BSSW93]). On each path the variables appear 
at most k times each in an order that is the same permutation repeated k times. 

2. A;-IBDD's. On each path the variables appear at most k times each in an order 
that is the concatenation of k (possibly different) permutations. 

3. Read-&-times programs. On each path the variables appear at most k times each. 

We remark that our definition of read-&-times programs prevents a variable from ap- 
pearing more than k times on any path from the source to either sink. These are 
sometimes referred to as syntactic read-&-times programs, in contrast to semantic 
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read-&-times programs in which the limited reading need hold only for those paths 
which some input may follow — un-traversable paths need not obey the read-&-times 
restriction. (The two dehntions are equivalent for k = I.) While the "semantic" def- 
inition is perhaps more natural from the point of view of algorithms (upper bounds), 
the "syntactic" definition is more combinatorial and more amenable to proving lower 
bounds. No lower bounds (for explicit functions) are known for semantic read-&-times 
programs. 

2.1.2 Nondeterminism 

The simplest and most common way to introduce nondeterminism is to permit some 
nodes to be unlabeled and allow either of the two outgoing edges to be traversed on 
any input. Such a program is said to accept if the input may follow some path from 
the root to an accepting sink — that is, there exists a path in the subgraph induced 
by removing edges that are not traversable. It is not surprising that polynomial-size 
nondeterministic branching programs accept exactly those languages in (nonuniform) 
NL, nondeterministic logspace. 

We may think of the unlabeled nodes of a nondeterministic branching program as 
being OR nodes. A standard generalization introduces nodes corresponding to other 
binary functions. Allowing AND nodes, for example, naturally enables polynomial-size 
programs to accept languages in co-NL. As NL = co-NL, it happens that allowing 
AND nodes results in the same power as OR nodes for polynomial-size programs 1 . 
Allowing both AND nodes and OR nodes enables polynomial-size programs to recog- 
nize alternating logspace, which is equal to P. By allowing parity nodes, polynomial 
programs recognize ©L, a logspace analogue to ©P [KW93]. Meinel [Me89] explores 
the range of all possibilities and concludes that allowing nodes of other binary Boolean 
functions does not give classes different from L, NL, P, or ©L. 



1 It is easy to see that the proof of [Im88] yields the same result in the non-uniform case: Given 
a polynomial-size branching program with OR nodes, that proof constructs another polynomial-size 
branching program with OR nodes that accepts exactly when the original program rejects. This OR- 
program for / is easily converted to an AND-program for / by replacing the OR nodes with AND 
nodes and switching the accepting and rejecting nodes. 
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Note that we have not introduced nondeterminism as we would with circuits, where 
we would allow nondeterministic variables as inputs. Defining nondeterminism in this 
manner immediately gives (nonuniform) NP for polynomial-size programs, since NP is 
characterized by polynomial-size nondeterministic formulas. 

We mention that Borodin, Razborov and Smolensky [BRS93] use a different defi- 
nition of nondeterminism: Nodes are unlabeled and each edge is either unlabeled or 
labeled with a variable and a value. Unlabeled edges are considered "free" edges which 
may be traversed by any input; labeled edges may of course be traversed only by inputs 
consistent with the label. The measure of size is number of labeled edges, rather than 
number of nodes. The difference in models is not of consequence for our purposes, 
as it is easy to see that the two size measures are within a constant factor of each 
other. Clearly, our nondeterministic branching programs are essentially a special case 
of theirs, and the number of edges in one of our programs is at most twice the number 
of nodes. Conversely, a program in their form is easily converted to one of our form in 
which the number of nodes is at most the number of edges in the original program. 

There is another model of nondeterministic branching programs, called rectifier- 
and-switching networks, which is preferred by Razborov because of the combinatorial 
characterization its size measure affords (see [Ra91 , Ra90]). A rectiher-and-switching 
network is essentially a nondeterministic branching program as [BRS93] defines them, 
except that the (directed) graph may contain cycles. There is no "rejecting sink" and 
the program accepts exactly when there exists at least one path from the source to 
the (accepting) sink. The measure of size is the number of labeled edges. Again, 
our nondeterministic programs are essentially a special case of rectiher-and-switching 
networks. So for a given function, our programs may be larger, but not by more than 
a quadratic factor, as the following transformation demonstrates. To make a network 
of E edges acyclic, place E copies of it in sequence redirecting original "back edges" 
(those edges which lead to a node that is not further from the root) to lead instead 
to the copy of the destination node in the subsequent copy of the graph. At most E 
copies are needed since any path contains at most E edges and an extra copy of the 
graph is needed only for each back edge in the path. Thus at a cost of squaring the 
size, we obtain a nondeterministic program in the sense of [BRS93]. It is not known if 
this measure is within a constant factor of the other two [Ra91 , Open Question #1]. 
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2.2 Manipulating branching programs 

As explained in Chapter 1, OBDD's have the useful property that they are easily 
manipulated: Given OBDD's for / and g that obey the same ordering of the variables, 
it is easy to construct an OBDD for / A g or / V g. Since the satisfiability of an OBDD 
is equivalent to the reachability of the accepting sink, OBDD's are also easily tested for 
satisfiability and thus equivalence. We remark that although the synthesis operations 
of constructing OBDD's for f f\g and fVg are intractable if the two given OBDD's do not 
obey the same ordering (as shown below), this condition is not necessary for testing 
equivalence. There is a polynomial-time algorithm due to Fortune, Hopcroft, and 
Schmidt [FHS78] for testing whether an OBDD is equivalent to a read-once program, 
which can be used in this case. 

2.2.1 Read-once programs 

Read-once programs do not enjoy quite the same degree of manipulability as their 
oblivious version, OBDD's. The read-once property implies that the program is satis- 
hable exactly when there is a path from the source to the accepting sink. However, 
the synthesis operations are provably not tractable: there exist functions / and g that 
each have polynomial-size read-once programs but whose conjunction / A g requires 
exponential-size read-once programs. Such an example is the function 7T-MATRIX of 
determining whether an n X n (0, l)-matrix is a permutation matrix — or equivalently, 
whether a bipartite graph on nodes V X W, where |V| = \W\ = n, is exactly a per- 
fect matching (and no further edges). 7T-MATRIX requires exponential-size read-once 
programs (see Section 2.3.3). On the other hand, it is easy to test that the each row 
has exactly one f or that each column has exactly one f — in fact, these two func- 
tions are easily computed by OBDD's (with different orderings of the variables) — and 
7T-MATRIX is true exactly when both of these functions are true. 

It is not known how to determine the equivalence of two read-once programs in 
polynomial time. Blum, Chandra, and Wegman [BCW80] give a co-RP algorithm 
(that is, it may say "equivalent" when in fact the programs are not, but never vice 
versa) which relies on randomly assigning to the literals values from a finite held and 
then computing the value of the DNF polynomial of the function. 
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2.2.2 Nondeterministic read-once programs 

Obviously, OR nodes trivialize the synthesis operation of constructing a program for / V 
g. They do not help, however, with constructing f Ag: the lower bound for 7T-MATRIX 
is actually proved for read-once programs with OR nodes, so a nondeterministic read- 
once program for f A g may require size exponential in the sizes of the programs for / 
and g. This result may be contrasted with NL = co-NL, which says that polynomial- 
size branching programs with OR nodes are equivalent to polynomial-size branching 
programs with AND nodes. We now see that if we restrict the programs to be read- 
once, OR nodes and AND nodes give different computational power [KMW91]. The 
same phenomenon occurs for linear-length oblivious programs [KMW92]. 

Determining the satisfiability of a program with AND nodes is NP-complete by 
the example in Section 2.2.4. The case of OR nodes is trivially as least as hard as 
determining the satisfiability of a deterministic read-once program, which is not known 
to be in P. In the case of PARITY nodes, the algorithm of [BCW80] works as long 
as the held used has characteristic 2 [SDG94]. In [SDG94], simple but very restrictive 
conditions on the use of AND and OR gates are given so that the correctness of the 
algorithm of [BCW80] is retained. 

2.2.3 fc-OBDD's 

By restricting the order to be the same permutation repeated k times, we retain the 
property that two programs with obeying the same ordering are easily combined — the 
usual product construction works as before for OBDD's. 

£;-OBDD's are also testable for satisfiability though with a little more effort. Regard 
the program as k separate segments corresponding to the k repetitions of the permu- 
tation in which the variables are read. If the size and hence the width is polynomial 
in n, then there are a polynomial number of nodes at the top of each segment. The 
portion of a segment between a particular top node and a "bottom" node (at the top of 
the subsequent segment) may be viewed as an OBDD. For an input to pass through a 
given sequence of k "top nodes" it must satisfy the conjunction of the k corresponding 
OBDD's (with source and accept nodes defined appropriately). To test whether these 



§2.3 Previous lower bounds 21 

k OBDD's are simultaneously satisfiable, we may construct an equivalent OBDD using 
the synthesis operation for OBDD's (since these k OBDD's obey the same ordering) and 
then check it for satisfiability. There are (poly) k = poly sequences of "top nodes" that 
an input may follow and the &-OBDD is satisfiable if one of these paths is satisfiable. 
Thus, to determine whether the &-OBDD is satisfiable, we sequentially check whether 
any of these sequences is traversable. 

Other operations on A;-OBDD's are considered in detail in [BSSW93]. 

2.2.4 &-IBDD's and read-&-times programs 

Unlike for A;-OBDD's, testing the satisfiability of even 2-IBDD's, and hence read-2-times 
programs, is NP-complete. The reduction, from SAT, places in sequence two OBDD's, 
one that checks the satisfiability of the formula with each variable uniquely renamed, 
and another that checks whether the corresponding variables have the same value. 
Since it includes satisfiability as a special case, testing the equivalence of two &TBDD's 
is also hard. 

Since a f-IBDD is simply an OBDD, the example 7T-MATRIX implies that the 
synthesis operations on &TBDD's are intractable even for k = 1 if the constructed 
program must also be a &TBDD. Naturally, the synthesis operations on a pair of 
&TBDD's are easy if we allow the constructed program to be a 2&TBDD. The same 
statements are true for read-&-times programs. 

2.3 Previous lower bounds 

The restriction of limited reading is severe enough that in contrast to the case of 
arbitrary branching programs, many exponential lower bounds have been proved for 
explicit functions, some of the functions quite simple. 

2.3.1 For oblivious programs 

Exponential lower bounds for the size of OBDD's are known for many functions, in 
particular the functions HWB ("Hidden- Weighted-Bit"), ACH ("Achilles-Heel"), and 
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integer multiplication, MULT, (all defined later), for which lower bounds were proved 
specifically for OBDD's. Krause [Kr91] proves lower bounds for other functions. We 
will have more to say about these lower bounds in Sections 2.4.2, 2.5, and 2.6. Of 
course, all other lower bounds mentioned below for stronger models imply a fortiori 
equally strong lower bounds for OBDD's. 

Also, in a very different vein, Alon and Maass [AM88] prove lower bounds for ar- 
bitrary oblivious programs of linear length, which do not obey any restriction on the 
number of times a variable is read. Their lower bound is discussed in Section 2.5.3. 
In similar spirit, Krause and Waack [KW9I] show that any oblivious program of lin- 
ear length for the problem of directed s-t connectivity requires exponential size; in 
[KMW92], similar lower bounds are proved for such programs with nondeterminism 
added. 

Using a lemma from [AM88], and the communication complexity arguments out- 
lined in Section 2.5.1, Gergov [Ge94] proves that computing MULT requires size 2 Q ( n > 
for arbitrary oblivious programs of linear length, even with nondeterministic AND, 
OR, or PARITY nodes. 

2.3.2 For read-once programs 

There has also been great success in proving lower bounds on the size of read-once 
programs. Many of the functions that require exponential size are very simple; some 
are easily computed with mere read-twice programs. 

Masek [Ma76] was the hrst to consider read-once programs, proving a lower bound 
of fi(m 2 ) on the size of any program determining whether Yll=i x i = m - ^ a ^ [Za84] and 
later Wegener [We88, We87] proved lower bounds of 2 u{ - n ^ for the function |-CLIQUE 
of determining whether a graph on n nodes contains a clique of size n/2, and also for the 
function ^-CLIQUE-ONLY, of determining whether a graph on n nodes contains an 
n/2-clique and no further edges. (For comparison, there is a simple read-twice program 
for | CLIQUE ONLY of size 0(n 3 ).) Dunne [Du85] proved a lower bound of 2 u{ - n ^ for 
the problems of determining whether a graph on n nodes contains a hamiltonian cycle 
and determining whether it contains a perfect matching. Simon and Szegedy [SS93], 
in order to demonstrate their lower bound technique, proved a lower bound of 2 Q ( n > 
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for the problem of determining whether a graph on n nodes is (n/2)-regular. Note 
that none of these bounds is fully exponential, since the number of input variables, 
one for each edge, is (")• Babai, Hajnal, Szemeredi and Turan [BHST87] proved an 
asymptotically optimal lower bound of 2 Q ( n ' for computing the parity of the number 
of triangles in a graph on n nodes; Simon and Szegedy [SS93] simplify and refine their 
analysis, improving the constant in the exponent. 

2.3.3 For nondeterministic programs, read-once and read-&-times 

Exponential lower bounds for explicit functions have also been proved for nondetermin- 
istic read-once branching programs. Krause, Meinel, and Waack [KMW91] (see also 
[Ju89]) give a lower bound of n\j (|!) = 2 u{ - n ^ for the function 7T-MATRIX. (It was 
known earlier that this function required exponential-size deterministic read-once pro- 
grams; see [Kr91, p. 10] and [Ju86].) Also, Borodin, Razborov and Smolensky [BRS93] 
prove a lower bound of 2 n{ '^ for the functions f-CLIQUE and f CLIQUE ONLY. 
Note that the complement of ^-CLIQUE-ONLY can be computed by nondeterministic 
read-once programs of polynomial size. 

Okolnishnikova [Ok91] proves that computing the characteristic function of the 
Bose-Chaudhuri codes requires deterministic read-&-times programs of size exponential 
in fl(y^n/k k ). Borodin et. al. [BRS93] exhibit for any k, a function that requires 
nondeterministic read-&-times programs of size exponential in fi(n/M fc ). Jukna [Ju92] 
extends the results of [BRS93] and [Ok91] to show that the function from [Ok91] 
requires nondeterministic read-&-times programs of size exponential in fl(y^n/k 2k ) even 
though its complement can be computed by nondeterministic read- once programs of 
polynomial size. 

Also, in [MW95], lower bounds are proved for nondeterministic programs in which 
each path obeys a bound on the number of alternations between sets of variables. 

2.4 Comparing the models: classes and structural results 

In this section, we will compare the classes of functions that are computable by 
polynomial-size programs of the various types. We will use sans-serif font to denote the 
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class of functions computable in polynomial size by the named model. For instance, we 
will use OBDD to denote the class of functions computable by OBDD's of polynomial 
size and READ-& for the functions computable by read-&-times programs of polynomial 
size. We will also need a notation for the union over all constants k: 

Definition 3 

C-OBDD ^ |J &-OBDD 

ken 

C-IBDD ^ |J k-\BDD 

ken 

READ-C ^ |J READ-& 



fceN 
(where C is for "constant"). 

We will use OBUV-UNEAR to denote the class of functions computable with oblivious 
programs of linear length and polynomial size. Note that 

&-OBDD c C-OBDD c C-IBDD c OBUV-UNEAR. 

The results presented in this section are summarized in Figure 2.1, which gives the 
inclusion relations of these various classes. 

2.4.1 Hierarchies in k 

It is known that the hierarchy over k of functions computable by A;-OBDD's of poly- 
nomial size is strict: &-OBDD C (k + 1)-0BDD [BSSW95]. For the case k = 1, we 
may refer to the function HWB, described below, which is in 2-OBDD but not OBDD. 
For &-IBDD's the hierarchy is also strict: &-IBDD C (k + 1)-IBDD [BSSW95]. These 
lower bounds are based on the well-known "rounds hierarchy" for communication com- 
plexity exhibited by the "^-pointer-chasing" function, &-PTR, on bipartite graphs 
[PS82, DGS84, Mc86, HR88, NW91] (in particular the result of [NW91]). 

It is not known whether the corresponding hierarchy for read-&-times programs is 
strict, except for the case k = 1, where we have seen that 7T-MATRIX ^ READ-1 
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but tt-MATRIX <G 2-IBDD C READ-2. Simon and Szegedy [SS93] conjecture that 
the problem of testing the regularity of hypergraphs, which (for the case of ordinary 
graphs) they showed separates READ-1 from READ-2, will separate the levels of this 
hierarchy. We reconsider this question in Chapter 4. 

2.4.2 Comparing the classes across models 

OBDD C READ-1; OBDD C 2-OBDD 

It can be shown that the inclusion OBDD C READ-1 is proper — that is, the ordering 
restriction does in fact limit the computational power of read-once programs. Demon- 
strating this separation is the function HWB(i) ("Hidden- Weighted-Bit"), which re- 
turns Xi if there are i ones in x and otherwise. HWB is computable in READ-1 by 
a clever algorithm that works its way in from the outermost bits of x; it is also easily 
computed in 2-OBDD. A standard lower bound argument shows that HWB ^ OBDD 
([Br91], see Section 2.5.1). 

Also, it is shown in [BHR95] (see also [BSSW93]) that ISA £" OBDD, where 
ISA(x, y) : {0, l} n X {0, 1} gn — > {0, 1} is the "Indirect-Storage-Access" function which 
returns x^ where i is the integer represented by the j/'th block of lg n bits of x if 
< y < n/ lg n, and returns if nj lg n < y < n. It is easy to see that ISA £ READ-1 
and ISA £ 2-OBDD. 

&-OBDD £ READ-1 for k > 1. 

Furthermore, the classes READ-1 and &-OBDD are incomparable (for any constant 
k > 1); their models may be thought of as orthogonal restrictions of read-&-times 
programs. 2-OBDD is separated from READ-1 by the function MHWB ("Multiple- 
Hidden- Weighted-Bit"), defined on 3 n-bit vectors x, y } and z as a:^!.^^©^^.)-^©^.^.)-^! 
where \x\ is the hamming weight of x and the sums are computed modulo n. MHWB 
has a natural read-twice algorithm where the variables may be read in order each time, 
so MHWB £ 2-OBDD. In [BHR95], it is shown that MHWB £" READ-1. Krause 
[Kr91, Remark 5.3] gives a different function which separates 2-OBDD from READ-1. 
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OBLIVIOUS 
LINEAR-LENGTH 



C-OBDD 



4-OBDD 



3-OBDD 



J vr-MATRIX G - 




2-OBDD 



Figure 2.1: The inclusion relations among the classes. "C — >D" means class C is con- 
tained in class D; inclusions that can be inferred by transitivity are not shown. Arrows 
labeled with problems denote proper inclusions, where the labeling problem separates the 
two classes. (Problems in parentheses denote separations that can be inferred from others.) 
The separations denoted with dotted lines show that further inclusions do not hold. 
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This result is in a sense best possible since 2-OBDD C READ-2. In the other 
direction, we have 



READ-1 £ C-OBDD. 

In [BSSW93], an exponential lower bound is proved for the size of A;-OBDD's for the 
function ACH ("Achilles-Heel"), defined on 2n + lgn Boolean variables as 



ACH(i ,. • • ,£„_!; J/o, • • • ,2/n-i; ^i,- •• ,zi. ~~ 



;i<n 

A (^ v yj+z) if * ^ ° 



l<j<n 

where z is the integer represented in binary by z\ . . . z\ gn and the sum j + z is computed 
modulo n. ACH is easily seen to be in READ-1, but a standard lower bound argument 
shows ACH is not in &-OBDD for any constant k [AGD91, BSSW93]. 

Krause [Kr91, Remark 5.4] gives a different function which separates C-OBDD from 
READ-1. 

The separation READ-1 <f_ C-OBDD is subsumed by the following result: 



READ-1 £ OBUV-UNEAR. 

This very strong separation is shown using the powerful technique of Alon and Maass 
[AM88]. They exhibit a function SEQ of 4n bits th at is easily in READ-1, but cannot 
be computed by any oblivious program of length 0(n) (see Section 2.5.3). This result 
exhibits most strongly how severe a computational restriction obliviousness is. 

This result is also best possible since OBUV-UNEAR is the largest of our classes 
not containing READ-1. 

2-IBDD ^C-OBDD. 

Clearly, &-OBDD C &-IBDD for each k; conversely, however, 2-IBDD £ C-OBDD. 
Again, the separating function is 7T-MATRIX: 7T-MATRIX G 2-IBDD easily, but 
7T-MATRIX G^ C-OBDD. This lower bound is claimed in [Kr91, Remark 5.5], but 
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to the best of our knowledge no proof has appeared, so we give one in Section 2.5.1 
(Theorem 1). 

This result indicates it really is a computational restriction to restrict the order to 
be the same permutation repeated k times rather than k different permutations. 

Finally, we mention that some functions that are provably outside these classes are 
easily contained in some of their nondeterministic counterparts. HWB, for example, 
while not in OBDD, is easily computed by a nondeterministic OBDD that initially 
branches into n different deterministic OBDD's, all with a common ordering. 

2.5 Lower bound techniques 

In this section, we describe the techniques that have been used to prove lower bounds 
in the various oblivious models: OBDD's, both deterministic and nondeterministic, 
£;-OBDD's and &TBDD's, and arbitrary oblivious programs of linear length. For com- 
pleteness as well as for demonstration, we supply a proof of Theorem 1, announced 
in [Kr91] without proof, and also prove Theorem 2, extending in a simple way the 
result of [BSSW93]. We compare these methods with lower bounds for non-oblivious 
programs, but defer a detailed description of the latter until the presentation of our 
own lower bound in Chapter 3. The technique of [BRS93] for proving lower bound 
for read-&-times programs will bemention only briefly in Chapter 4, when we outline 
approaches to some open problems. 

2.5.1 For OBDD's, ^-OBDD's, and &-IBDD's 

Lower bounds for OBDD's follow a simple strategy: Show that for any Y C X of some 
fixed size (say m = ra(ra)), there are many (say 2 Q ( n >) subfunctions on Y. If the first 
n — m variables read by an OBDD are Y, clearly any two assignments to Y that induce 
different subfunctions on Y must lead to different nodes. Since this lower bound holds 
for any set Y of size m, 2 Q ( n > is a lower bound on the number of nodes for any OBDD. 
Most lower bounds for OBDD's show explicitly that there are many subfunctions by 
exhibiting for any Y of the stated size an exponential number of settings to Y such 
that for any two, there is a setting to Y on which the respective subfunctions differ. 
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This argument may be nicely interpreted in terms of communication complexity: 
one party gets the values of the bits in Y and the other party the bits in Y. The second 
party must compute the value of the function based on a single message sent by the 
hrst party. An OBDD gives a communication protocol where Y is the hrst ra variables 
in its ordering. If the program has w nodes at the level immediately following the nodes 
of Y, then the message has lg w bits. Thus, if the one-way communication complexity 
is linear for every Y of size ra, then the function requires OBDD's of exponential 
size. Bryant [Br91] uses a simple argument of this form to prove that HWB requires 
exponential-size OBDD's. 



Commonly, it is proved that in fact the unlimited-round, two-way communication 
complexity of the function is linear for any Y of size ra. This argument is sometimes 
made in terms of what is called a "fooling set" for the function / with respect to Y. 
For Y C X, and x,x' £ {0, l} n , let x denote the value of x on the variables in Y, 
and let x x'y denote the n-bit input string equal to x on the variables in Y and equal 
to x' on the variables in Y. A fooling set F C {0,l} n for / with respect to Y has 
the property that for all x ^ x' £ F, f(x) = f(x') = 1 and either f(x x 1 —) = or 
f(x'xy) = 0. Thus, if an OBDD obeys an ordering in which the variables in Y are 
read hrst, the setting x cannot lead to the same node at level ra as the setting x' 
since either f(xxy) ^ f{x x'y) or f(x' x'y) = 1 ^ f(x'xy). If for every Y of size 
ra there is a fooling set of exponential size, then the function requires exponential- 
size OBDD's. Furthermore, the existence of a fooling set F for Y implies that the 
(unrestricted) communication complexity with respect to the partition Y U Y is lg \F\. 
This is seen by inspecting the associated matrix Mij where i (resp., j) ranges over 
all values of x for x £ F (resp., of x— ) and M / = fix x'y). Note that the 
definition of F implies that x ^ x' and Xy ^ x'y for x ^ x' } so M is a square matrix 
of dimension \F\. M has l's on its diagonal because f(x Xy) = 1, and since either 
M^ = or Mji = 0, no two l's on the diagonal can appear in the same all- l's minor. 
Since a communication protocol of b bits partitions the l's of the matrix into 2 b all- l's 
minors, the communication complexity is at least lg \F\. 
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For fc-OBDD's 

If, for any set Y of size m, there exists a fooling set of size 2 Q ( n \ then it is also easy 
to see that the function requires &-0BDD's of size 2 n ( n ^ 2k [Kr91]. A &-0BDD gives a 
communication protocol of 2k rounds; the total communication is 2klg(width) } which 
must be at least fi(n), giving the desired bound on the width and hence the size. 

For example, we give a simple proof that 7T-MATRIX ^ &-0BDD for any constant k. 

Theorem 1 tt-MATRIX £ C-OBDD 

Proof: We will show that for any partition of the n 2 variables into two sets X and X 
of equal size, 2 Q ( n > is a lower bound on the rank of the matrix of the communication 
complexity game where player I gets X and player II gets X . 

First notice that for certain partitions, the proof is easy. For example, consider the 
partition where player I gets the variables in rows 1, . . . , n/2 and player II gets the 
variables in rows n/2+1, ... , n. We may even restrict our attention to only those inputs 
where each row has exactly one 1 and each player gets exactly n/2 l's. The inputs to 
the two players then correspond merely to subsets of the columns; the players accept if 
the subsets are disjoint and reject otherwise. It is easy to see that this problem requires 
lg ( ^ 2 ) bits of communication, since the (V^/^y - W2) ma t r i x °f the communication 
game is diagonal. 

Our proof will follow the spirit of this strategy for arbitrary partitions. Let r 8 - be 
the number of X- variables in row i. Order the rows so that r\ < r 2 < • • • < r n . We 
have \X\ = ^ /i V{ = n 2 /2. Let rows n/2 + 1, . . . , n be the "top half" of the matrix. 

First consider the case that the top half contains at least 3/4 of the X-variables: 
Sr=n/2+i r « — \\- ^ n ^^ S case 5 a t least 2/3 of the columns have at least n/8 X- 
variables in their top halves: otherwise, the number of X variables in the top half is 
less than 

2nn n n 3n 2 
Y2 + 38 ~ ~8~' 
a contradiction. Since the top half contains exactly half of all the variables, the "bottom 
half" (rows 1, . . . , n/2) has at least 3/4 of the variables in X. It follows that at least 
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2/3 of the columns have at least n/8 X-variables in their bottom halves. Therefore 
at least n/3 columns contain at least n/8 X-variables in the top half and at least n/8 
X-variables in the bottom half. Let C be any subset of n/4 of these columns. 

For any subset C of half the columns of C, there is a setting to X in which exactly 
one X-variable in the top half of each column of C is 1 and each such 1 appears in 
a different row. This is because \C'\ = n/8 and there are at least n/8 X-variables in 
the top half of each column of C. Let us restrict attention to particular settings to 
the variables in C. On X, these settings shall be as described above (for some C) in 
the top half, and shall be in the bottom half. On X, these settings shall be in the 
top half, and in the bottom half shall contain l's in n/8 different rows and different 
columns C" . 

If these two subsets C and C" of columns are complementary (C U C" = C), 
then there is a setting to the remaining variables for which the input is a permutation 
matrix, making the function 1. If these two subsets of columns are not complementary 
(C U C" C C), some column in C contains both a 1 in its top half and a 1 in its 
bottom half, so that for all settings to the remaining variables, the function is 0. We 
partition these settings to X (l's inputs) into (™/ 8 ) blocks, according to which subset 
of C contains the l's in X. Similarly, we partition the settings to X (IPs inputs). 
Thus the communication complexity matrix associated with these inputs is comprised 
of (™/ 8 ) minors, and only the minors on the diagonal contain l's. This matrix clearly 
has rank at least (^) = 2 n H 

Now consider the case that V™ ,„ r,- < t 1 ^-. In this case, the bottom half has at 

/-~/i=n 2 '42 ' 



least -~y X- variables, implying that 

fnli > — ; — = n/4. 
1 ~ n/2 ' 

Since V{ > r n / 2 for i > n/2, it follows that there are at least 4n/10 rows in the top half 
with at most 7n/8 X-variables: otherwise, there are more than 

4n 7n n n 3n 2 

10 Y + 104 ~ ~8~ 
X-variables in the top half, a contradiction. Let R be these 2n/5 rows, each containing 
at least n/4 and at most 7n/8 X-variables. 
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Let C L be columns 1, . . . , n/2 (the "left half") and C R be columns n/2 + 1, . . . , n 
(the "right half"). Since each row in R has either more X- variables in C L or C R , at 
least half of the rows in R have most of their X-variables in one, say C L . Each of these 
n/5 rows has at least n/8 X-variables in the left half and at most 7n/16 X-variables 
in the right half. Alternatively, each of these rows has at least n/8 X-variables in the 
left half and at least n/16 X-variables in the right half. 

We now hx some n/8 of these rows and the rest of the proof proceeds as in the hrst 
case, yielding a lower bound of { 16 ) = 2 Q ( n > . ■ 

It is easy to see that 7T-MATRIX has OBDD's of size 0(n2 n ): the variables are 
read column- wise, easily ensuring that each column has exactly one 1; furthermore, 
the OBDD keeps track of the subset of the rows in which l's have appeared, requiring 
width 0(2 n ). Interestingly, for £;-OBDD's, just as the lower bound degrades roughly 
by a factor of k in the exponent, yielding 2 Q ( n ' k \ similarly the upper bound can be 
improved by a factor of k in the exponent. Construct a &-OBDD of width 2 Q ( n ' k > 
by reading the variables column-wise, but keeping track only of n/k rows at a time: 
Partition the rows into k sets of size n/k each, and in segment i = 1, . . . , k, keep track 
of the subset of the zth set of rows in which l's have appeared. Accept only if in each 
segment, each of the i rows is found to contain exactly one 1. 

For &-IBDD's 

The only lower bound that is proved specifically for IBDD's (i.e., which does not apply 
to linear length oblivious programs more generally) is the lower bound of [BSSW95]. 
They reduce the problem to one of communication complexity in the following manner. 
Given an IBDD, they construct two disjoint subsets of the variables by considering the 
levels of the IBDD one at a time. Each level disqualifies at most one-half of the variables 
in each set, so that after a constant number of levels, still a constant fraction 2~ k of 
the variables are retained. They argue that the problem restricted to these variables 
is a smaller version of the original problem, and hence the known linear lower bound 
on the communication complexity applies. 

To demonstrate, we give an easy lower bound which has not appeared in the liter- 
ature. The proof is very similar to the lower bounds of [BSSW95] and [Ge94]. Recall 



§2.5 Lower bound techniques 33 

that [BSSW93] showed ACH G - C-OBDD; we will show that ACH G - C-IBDD. 

Theorem 2 ACH G - C-IBDD. 

Proof: Consider a &-IBDD G computing ACH. We will show that G has size at least 
2"/ fc2 \ Recall from Section 2.4.2 that 

ACH(x, y, z) = Ai<j< n (^ V y 3+z ) if z ^ 0, and Vi<j< n (^ A j/j) if z = 0. 

We think of G as being composed of k segments, each with n levels corresponding 
to a permutation of the variables. Suppose we could show that for some z there are 
subsets of variables 

X s = {xi : i G S} C X and Y s = {y t+z : i G S} C Y 

of size at least n/2 2k such that for each segment of G, either all variables of Xs appear 
before Ys or vice- versa. Then we may invoke the communication complexity argument 
in which the players get 2k rounds or fewer. If z > 0, we get a fooling set of size 
2' Xs ' with respect to Xs by taking inputs ranging over all settings to Xs and where 
yi +z = Xi = 1 for i G - S and yi +z = ~xl for i G S . For each such input w = (x } y } z) } 
we have ACH.(w) = 1, but for two different such inputs, w ^ u/, we have either 
ACH(j« w 'ir~) = or ACH(u/ u^^^) = 0. Similarly, if z = 0, letting j/ 8+ ^ = X{ = 
for i ^ S and j/ 8+ ^ = x7 for i £ S, for each such input w = (x } y } z) } we have ACH.(w) = 
0, but for two different such inputs w ^ w' we have either ACH.(w v w'-^-) = 1 or 
ACHfio' u^v^) = 1- If we can find such a z. Xs and I5 for any given G, then 
the communication complexity argument implies that the width of G is exponential 



in (n/2 2k )2k. 



We now show that there exist z, Xs, and Ys as desired. Without loss of generality, 
suppose the hrst half of the hrst segment of G has more X variables than Y variables. 
Let X\ C X appear in the hrst half and Y\ C Y appear in the second half so that 
l^i I = l^i I ^ n/2. Now partition the second segment of G in "half" with respect to the 
n variables X\ U Y\ only. If the hrst half contains more X variables than Y variables, 
let X 2 C X\ appear in the hrst half and Y 2 C Y\ appear in the second half, so that 
|^2| = |^2 1 ^ n/4. Otherwise, let X 2 C X\ appear in the second half and Y 2 C Y\ 
appear in the hrst half. Repeating this process for the k segments, we finally obtain 
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Xk and Yk of size n/2 k with the desired alternation property. Since Xk X Yk has size 
at least n 2 /2 2fc , it contains at least n/2 2k pairs (xi,yi +z ) for some value of z between 
and n — 1. The X{ in these pairs constitute Xs- ■ 

Note that 7T-MATRIX does not enjoy the same self-reducibility property: the above 
proof applied to the 2-IBDD for computing 7T-MATRIX finds X 2 equal to the variables 
in one quadrant of the matrix and Y 2 equal to the variables in the diagonally opposite 
quadrant. Indeed, for any setting to the remaining variables, only one bit of communi- 
cation between the players is necessary to compute the function: player I checks that 
the top rows and left columns are okay, and player II checks that the bottom rows and 
right columns are okay. 

2.5.2 For nondeterministic BDD's 

Lower bounds for nondeterministic BDD's also follow from the existence of exponential- 
size fooling sets: they imply that the function requires exponential-size nondetermin- 
istic OBDD's, when OR gates 2 or PARITY gates are allowed. 

For example, consider an OBDD with OR nodes. We may view the corresponding 
communication protocol as containing nondeterministic choices by the players, giving 
in effect the OR of many deterministic protocols. Each such deterministic protocol 
determines some I-rectangles (all-l's minors); together, the I-rectangles of all the pro- 
tocols must cover all the l's of the matrix without covering any of the O's. Thus the 
communication required is at least the logarithm of the "cover number" (the number 
of I-rectangles needed), or equivalently, the logarithm of the rank 3 over the Boolean 
semiring B ({0, 1} with A and V; it is a semiring because 1 has no additive inverse). 
As discussed earlier, the matrix corresponding to a fooling set of size \F\ has all l's 
on its diagonal, no two of which may appear in the same all l's minor, so the cover 
number, or the rank over B, is \F\. 



2 The asymmetry with respect to OR/AND occurs because of the choice f(x) = 1 rather than 

f(x) = in the definition of fooling sets. 

3 The rank of a matrix over a semiring is the fewest number of pairs of (column) vectors (i>, w) such 

that M = ^iViwJ . This specializes to the "cover number" in the case of B and to the dimension of 

the column space in the case of a field. 



§2.5 Lower bound techniques 35 

Similarly, with PARITY nodes, the communication required in the corresponding 
communication game is the logarithm of the rank of the matrix over GF(2). Since 
column operations make the matrix lower triangular, it has full rank over GF(2) as 
well. 

With AND nodes we have the dual of OR nodes: the communication complexity 
is equal to the nondeterministic communication complexity of the complement of the 
function, or the rank over B of the matrix with O's and l's reversed. Note that for a 
particular partition of the variables, this may be exponentially less than the case of 
OR nodes: the function EQUAL?(i, y) with respect to the partition X U Y requires 
nondeterministic complexity \x\ = \y\ whereas its complement has nondeterministic 
complexity 21g \x\. 

2.5.3 For arbitrary oblivious programs 

Alon and Maass [AM88] prove strong lower bounds for arbitrary 3-way oblivious pro- 
grams by analyzing the sequence S in which the variables are read by the levels of 
the program. In particular, for any two disjoint subsets of variables S and T, they 
consider the number of times this sequence alternates between reading variables of S 
and variables of T . They prove a theorem that says if for every two subsets S <Z X and 
T C Y with l^l = |T| = n/2 m (where \X\ = \Y\ = n) there are at least m alternations 
between S and T, then the sequence must be of length at least ft(nm). 

They use this theorem to prove a superlinear lower bound on the length of oblivious 
branching programs for the "sequence equality function" SEQ, defined on two ternary 
vectors x and y of length n where each X{ and j/ 8 - may be 0, 1, or 2. SEQ(x,j/) = 1 
if the subsequence of x obtained by removing the 2's is equal to the subsequence of 
y obtained in the same manner. A standard "cut-and-paste" (or "crossing-sequence") 
argument shows that in any 3-way branching program 4 for SEQ and for any S and T 
as above, the number I, of alternations between S and T must satisfy w l > 2' 5 ' where 
w is the width of the program. So for w = 2 n ' 2 m , this yields £ > 2 m . In particular 
£ > m, and so the theorem gives a lower bound of fi(nm) on the length of the program. 



4 This is a branching program in which each node has 3 edges leaving it. 
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Thus any oblivious program for SEQ of size 2°( n > must have superlinear length. 
This lower bound for 3-way programs clearly implies the same lower bound for ordi- 
nary branching programs, where each ternary variable X{ is represented by two binary 
variables. This implies, for instance, that SEQ ^ C-IBDD. For comparison, SEQ has 
very easy read-once programs, which are non-oblivious, of length n. 

In [KMW92], this lower bound for SEQ is extended to nondeterministic oblivious 
programs of linear length. At the same time, a simple co-nondeterministic oblivious 
program (with AND nodes) of linear length is given, showing that as for read-once 
programs (Section 2.2.2), the two types of nondeterminism give different computational 
power. 

Babai, Nisan, and Szegedy [BNS92] in the same spirit improve this length/width 
tradeoff, using their lower bound for multiparty communication complexity to raise 
the lower bound on the length of polynomial-size oblivious programs (for a different 
function) by a factor of lg n. 

2.5.4 For read-once programs 

Note hrst that the lower bound method for OBDD's is insufficient for read-once pro- 
grams. Even though there may be many subfunctions arising from the settings to any 
Y C X of a given size, it may also be that for each Y there is one subfunction that 
arises from many of the settings to Y. Since different paths may read the variables of 
X in different orders, different sets Y' may be the "hrst" ones read depending upon 
the values of the variables. In this case, we have not excluded the possibility that the 
hrst m input bits are read in such a way that the program needs nodes for only the 
"large" subfunctions on the various Y of size n — m. 

For example, we saw that for the function ACH, there is a fooling set of size 2 n ' 4 
for any subset of half the X and Y variables (Theorem 2, specialized to OBDD's). 
However, there is a simple read-once program that reads the z variables hrst and then 
reads the X and Y variables in the appropriate order, pair by pair. Looking closely 
at this program, we see that there are n different subsets of half the X,Y variables 
that may be read hrst. For each subset there is a large fooling set, implying that there 
are many possible subfunctions on the remaining variables. However, the values of 
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the variables (specifically, of the z variables) that give rise to these many subfunctions 
cause other paths to be taken through the program. For the path that leads to a given 
subset of X, Y, there are only two subfunctions (either the function or the induced 
ACH function) arising from the many settings to the variables (in Z and the rest of 
X,Y) read so far. 

In order to prove lower bounds for read-once programs, we must show that not 
only are there many subfunctions, but that each arises in very few ways. Simon and 
Szegedy [SS93] distill this idea into a lemma which may be considered a paradigm 
for proving read-once lower bounds. This technique appears implicitly in the read- 
once lower bounds of [We88, Za84] and explicitly in those of [Ju88, Kr88, Du85]; the 
generalization in [SS93] enables an easier proof of the lower bound of [BHST87] and 
others [We87, Du85, Ju88]. Simon and Szegedy use this technique to reprove a theorem 
of Babai et. al. [BHST87], that read-once programs require size 2 Q ( n ' to count modulo 
2 the number of triangles in an n-node graph. They also give a simple proof that size 
2 n ' w is required to tell whether an n-node graph is ^-regular. Since the lemma is a 
central part of our lower bound for multiplication, we provide a proof in Chapter 3. 

2.6 Integer multiplication 

By integer multiplication, we will refer to the Boolean function MULT : {0,l} 2n — > 
{0,1} that computes the middle bit in the product of two n-bit integers. That is, 
MULT(i, y) = z n _ x where x = x n _ x ■ ■ ■ x , y = y n -\ • • • J/o, and z 2n -i ■ ■ ■ z = z = xy 
is the product of the integers represented in binary by x and y. The middle bit is the 
"hardest" bit, in the sense that if it can be computed by read-once branching programs 
(or most any computational model) of size s(n), then any other bit can be computed 
with size at most s(2n). 

2.6.1 Bryant's lower bound 

Bryant [Br91] gives the following lower bound for MULT; Gergov [Ge94] notices that 
the proof holds also for nondeterministic OBDD's, as noted the end of the proof below. 
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Theorem 3 MULT G - OBDD. 

Proof: We will show that with respect to any subset S C {zi, . . . } x n } of size n/2 
(corresponding to the hrst n/2 variables of X read by an OBDD), MULT has a fooling 
set of size 2 n ' 8 . The elements of the fooling set differ only in their settings to the xf, the 
Hi are fixed so that the multiplication is reduced to computing the sum of two integers, 
one corresponding to a subsequence of Xi, . . . } x n / 2 and the other corresponding to a 
subsequence of x n / 2 +i 7 • • • ,x n . The nth bit of the product is the high-order bit in this 
sum. 

Choose these two subsequences so that for each z, the zth bit of one is in S and the 
zth bit of the other is in S, and they are equally far apart in x for all i. To do this, let 

Sl = S C\ {x 1 ,... ,x n / 2 ] and Sr = S n {x n / 2+ i, • • • , x n ] 

and similarly define Sl and Sr for S. It is easy to show that 

\S L xS R \ + \S L xS R \>n 2 /8 

and since I < \xi — Xj\ < n for each (xi, Xj) G (Sl x Sr) U (Sl x Sr), we see that there 
is a subset of size n/8 with the desired property. 

Exactly two bits of Y are set to I in such a way that these two subsequences "line 
up" and so that the carry out of their high-order bit corresponds to the nth bit in the 
product of x and y. The bits of X not contained in either subsequence are set to 
unless they are in {x n / 2 +i? • • • ? x n} and lie "in between" the bits of the subsequence. 
This causes carry bits of the addition to propagate as desired and thereby reduce 
the multiplication of x and y to the addition of the two integers determined by the 
subsequences. See Figure 2.2. 

We may think of the addition of these two integers as the addition of an integer 
determined by the setting to S and an integer determined by the setting to S. The 
fooling set ranges over all settings to the integer determined by S. Each of these two 
integers may take on any value between and 2 n ' 8 — I, in turn making the nth bit of 
the product is I if their sum is at least 2 n ' 8 and otherwise. 

The corresponding matrix has rows indexed by all 2 n ' 8 settings to S"s integer and 
columns indexed by all 2 n ' 8 settings to S"s integer. After deleting the 0-column and 
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Figure 2.2: The multiplication of x and j/ is reduced to computing the carry bit 
in the sum of the integers represented by the two subsequences XiX 3 x^ 
and x p x q x r . For each corresponding pair (xi,x p ), (xj,x q ), and (xk,x r ), 
one variable is in S and the other in S. The fooling set ranges over all 
settings to the variables in S, with each variable's "partner" getting the 
complementary setting. The remaining variables are set as shown in order 
to achieve the desired reduction. 



0-row and indexing appropriately, this matrix is lower-triangular with all l's in the 
lower half. It thus has full rank over B and over GF(2), and so does its complement. 
It follows that MULT requires exponential-size OBDD's and £;-OBDD's even if OR, 
PARITY, or AND nodes are present. ■ 



Gergov [Ge94] further generalizes Bryant's lower bound for MULT to arbitrary 
oblivious programs of linear length by using the main lemma from [AM88] . For any 
program of length kn, the lemma implies the existence of two "large" disjoint subsets 
of X (size n/k2 2k ) such that there are few (O(k)) levels where the program changes 
from reading variables of one set to reading variables of the other set. Now reduced to 
a problem of communication complexity with 2k rounds, it is easy to carry though the 
rest of Bryant's proof to find a fooling set of size 2 n ' k 2 . Thus, the program has size 
at least 2 n ' 2k 2 . As reasoned above, this bound holds even if nondeterministic nodes 
are present. 
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2.6.2 The decision problem DMULT — the graph of multiplication 

Although it is not directly related to the issue of verification, another Boolean function 
that has been considered is the decision problem DMULT of recognizing the graph of 
multiplication. That is, DMULT(x, y } z) = I if xy = z. Note that it is not readily 
apparent which problem is "harder", MULT or DMULT. On the one hand, DMULT 
seems to require practically computing all the bits of xy] however, an algorithm for 
DMULT has the advantage of inspecting all the bits of z, the putative product. Buss 
[Bu92] proves that DMULT ^ AC by reducing it to counting the number of l's in the 
input (and therefore to MULT and to PARITY by results of [CSV84]); for comparison, 
[FSS84] gives an easy reduction of MULT to PARITY to show MULT ^ AC . 

A simple argument [We94] shows that computing DMULT with read-once programs 
is as hard as factoring. Given a polynomial-size read-once program for DMULT and 
any integer n, the following procedure will either factor n or determine that it is prime. 
First instantiate n as the bits of z in the read-once program where \z\ = 2lgn and 
\x\ = \y\ = Ign. There is a satisfying assignment to the remaining input bits since 
\z = z. Now attempt to construct a nontrivial factor by instantiating the bits of x 
one at a time, maintaining the satisfiability of the program after each bit. If the only 
successful instantiations for x are I and z, then z is prime; otherwise, a nontrivial 
factor is determined. Since we can test the satisfiability of a read-once program in 
polynomial time, the entire procedure can be executed in polynomial time. 

Jukna [Ju94] proves a lower bound of 2 n ' k for DMULT on non-deterministic 
read-&-times branching programs. His lower bound follows the framework of [BRS93], 
and gives a simple reduction of DMULT to the problem of recognizing codewords of a 
linear code, for which a lower bound of 2^' k is proved in [Ju92]. 

2.7 Related issues 

2.7.1 The ordering problem for OBDD's 

When using OBDD's for verification, it is naturally desired to minimize their size. 
For a given function, the order in which the variables are read greatly affects the 
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number of nodes required — it is easy to exhibit functions which have small OBDD's 
for good orderings but require exponential size for poor orderings. Thus, an important 
and interesting question is how to determine the ordering that minimizes size for a 
given function. The decision problem is: Given an OBDD and an integer k, determine 
whether there is an OBDD (possibly obeying a different ordering of the variables) 
with fewer than k nodes that computes the same function. This problem was recently 
proved to be NP-complete in [BW95], extending the work of [BW95, THY93], via a 
nice reduction to OPTIMAL LINEAR ARRANGEMENT [GJ79]. 

It would be useful to find an efficient algorithm to determine an approximately 
optimal ordering. Many heuristics for improving an ordering can be found in the 
literature (see [BW95]). It is worth mentioning that the use of randomization has 
not been explored, either in helping to determine good variable orderings or in the 
verification strategy more generally. 

2.7.2 The Fourier spectrum 

The Fourier spectrum of Boolean functions has been widely studied over the past few 
years. Properties of the Fourier spectrum have been used in a variety of applications, 
perhaps most strikingly in deriving efficient algorithms for learning (e.g., [KM91]). 
Two properties of the spectrum that have proven useful for this purpose are small 
Zi-norm (that is, the sum of the absolute values of the coefficients) and a knowledge 
of which coefficients are the largest. For example, [KM9I] gives an efficient algorithm 
for functions whose spectrum is either sparse or has polynomial Zi-norm. 

It is easy to show that the Zi-norm of a function is bounded by the number of 
leaves in any decision tree for that function, even if the nodes may query the parity 
of arbitrary subsets of the variables. And [LMN89] proves that functions in AC have 
most of the weight of their spectrum in the coefficients of small sets. These results 
are used to derive efficient learning algorithms for functions in AC and functions with 
shallow decision trees. 

Since OBDD's are such a constrained model of computation, perhaps interesting and 
useful properties can be derived about the spectrum of the functions in OBDD they 
compute. Some negative results are known: Bruck and Smolensky [BS90] demonstrate 



42 Related models 



a function in AC that has exponential Zi-norm; this function is easily computed 
by polynomial-size OBDD's. They also exhibit a function (inner product modulo 2), 
also easily computed by an OBDD, whose transform has Zoo-norm less than l/2 log n . 
This is an an even stronger result and further implies that any polynomial p( a; i, . . . , x n ) 
whose sign represents this function (i.e., whose is negative exactly when the function 
is 1) must have 2 log n non-zero coefficients. 

Comparing OBDD's with constant-depth circuits, we note that PARITY, though 
not in AC , is easily computed by small OBDD's, while 7T-MATRIX is easily in AC 
but requires exponential-size OBDD's. 

2.7.3 Read-once programs and resolution proofs 

If we consider branching programs for computing multi- valued functions, we may hnd 
a nice correspondence with resolution proofs. 

A resolution proof for a CNF formula <f> is a straightline program for proving that <f> 
is not satishable. At each step, two previously obtained clauses, (x 8 - V a) and (xl V /3), 
are "resolved on x" to obtain a new clause (a V f3) which is satishable if the previous 
clauses are (a and f3 are disjunctions of literals). The proof is complete when the empty 
clause is obtained. Such a proof is naturally viewed as a directed acyclic graph where 
the clauses correspond to the nodes of the graph: the original clauses of <f> are "input" 
nodes with indegree 0, the newly obtained clauses are "internal" nodes with indegree 2, 
and the empty clause is the "output" node with outdegree 0. Such a resolution proof 
is called regular if on every directed path from an input node to an output node, each 
variable is resolved at most once. 

We may consider a branching program for an unsatishable CNF formula <f> that 
solves the following "search" problem: given an assignment x, hnd a clause of <f> that 
is not satisfied. It is an observation of Chvatal and Szemeredi (see [LNNW95]) that 
read-once programs for this problem are isomorphic to regular resolution proofs. Taken 
together with the fact that a decision tree is a read-once branching program, [LNNW95] 
notes that D(cf>) > \gRRES(<f>), where D(cf>) is the depth of the shallowest decision 
tree for this search problem and RRES{cf>) is the fewest number of steps in a regular 
resolution proof of <f>. 
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In general, an arbitrary resolution proof for <f> yields a branching program for this 
search problem, but not vice-versa: in fact, there are formulas for which RES{cf>) is 
exponential [CS88, e.g.], even though there is always a branching program of size 0(|<^|) 
for the search problem. 
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Chapter 3 



A lower bound for multiplication 
with read-once programs 



This chapter describes a lower bound of 2™v^' on the size of read-once branching 
programs for the function MULT. This is the hrst superpolynomial lower bound for 
multiplication on non-oblivious branching programs. This result demonstrates that 
relaxing the ordering restriction of OBDD's is insufficient to gain the computational 
power desired for the purpose of hardware verification. 

The lower bound for multiplication is motivated by the work of Simon and Szegedy 
[SS93], who give a basic lemma for proving lower bounds on the size of read-once 
branching programs. The lemma involves Neciporuk's method of counting the subfunc- 
tions that are possible when some subset of input bits is fixed. We begin by describing 
this lemma in Section 3.1. For ease of presentation we hrst prove a lower bound of 
2 n (v") i n Section 3.2, and then extend the proof to achieve 2™v^' in Section 3.3. 

In Section 3.4, we define the notion of read-once reductions in order to deduce 
similar lower bounds for other arithmetic functions. 

3.1 A paradigm for read-once lower bounds 

Let / be a Boolean function, / : {0, l} n — > {0, 1}, and let X = {x 0} . . . , x n _i} be its 
n binary input variables. Let T be a filter on X. (That is, T C 2 X and T is closed 
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upward — if S G T ^ then all supersets of S are in J 7 .) A subset B C X is said to be 
in the boundary of J 7 if i? G - J 7 but (i? U x 8 ) G J 7 for some x 8 -. By setting the values of 
B = X \ B } we naturally induce a function on B. The lemma is stated below in the 
form we will need it; it appears in [SS93] in slightly more generalized form. 

Lemma 1 (Simon and Szegedy) If for any B in the boundary of J 7 , at most 2^ B ^/L 
settings to B induce the same subfunction on B , then any read-once branching program 
computing f has size at least L. 

For completeness, we now provide a proof of this lemma. 

Proof: The idea is to identify a "frontier" of edges in the branching program — a cut 
containing exactly one edge from each source-to-sink path — in which every edge allows 
only a fraction 1/ L of the inputs in {0, l} n to pass through it. Since the path of every 
input passes through some frontier edge, there must be at least L such edges. Having 
fan-out 2 and only one root, the program also has at least L nodes. This is because 
if the endvertices of the frontier edges were distinct, they would be the leaves of an 
embedded binary tree which must contain L — 1 distinct internal nodes. Since the two 
sinks are not among these internal nodes, there are at least L-\- 1 nodes in the program. 

In order to characterize a frontier, we hrst associate with each node of the program 
the set of variables appearing in the subprogram rooted there — that is, those variables 
appearing on nodes that are reachable from the given node. Clearly, along any path 
through the program, the variable-sets of later nodes are subsets of the variable-sets 
of earlier nodes. A frontier consists of those edges going from nodes with "large" sets 
of variables to nodes with "small" sets. "Large" sets are defined to be those that are 
in the filter J 7 . Clearly there is exactly one frontier edge on each source-to-sink path, 
as (for nontrivial filters J 7 ) the root has the variable-set X G J 7 and the sinks have the 
variable-set G - F . With each frontier edge we associate a set B C X in the boundary 
of T. 

Suppose boundary set B is associated with a given frontier edge. Because the 
program is read-once, these variables do not appear on any path from the root to this 
edge. In fact, the inputs x G {0, l} n that reach this edge are characterized exactly by 
their settings to B. Each setting to B that reaches this edge clearly induces the same 
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subfunction on B, as denned by the subprogram rooted there. Since at most 2' B '/L 
settings to B give the same subfunction on B, at most (2' B ' / L) ■ 2' B ' = 2 n / L inputs in 
{0, l} n may pass through this frontier edge. The lower bound of L then follows. ■ 



3.2 A lower bound of 2 fi ^ 

Theorem 4 Any read-once branching program for MULT has size 2 Q (^> . 

t • • • t X n — \ 



Proof: Let m = y/n/4 and let X and Y denote the sets of variables X = {x 
and Y = {j/o, • • • , Vn-i}- Dehne the hlter 

T = {V C {X U Y) : \V n X\ > n - m and \V n Y\ > n - m}. 

Roughly speaking this hlter marks the frontier of the program where at most m bits 
of X and at most m bits of Y have been read. 1 

We will show that for any B in the boundary of J 7 , at most 2' B ' m settings to B give 
the same subfunction on B. By Lemma 1, this gives the desired lower bound of 2 m . Fix 
any B in the boundary of T and let S = B. Think of S as being the variables already 
read by the branching program. Since B is in the boundary of J 7 , either IS'flXl = m or 
\S n Y | = m. We will show that there is a subset S' C S of size at least m such that if 
two settings to S differ on S' then they induce different subfunctions on S = B. Thus 
at most 2' s ' m settings to S = B induce the same subfunction on S = B, as desired. 
We will show that the two subfunctions are different by explicitly demonstrating a 
single setting to the bits of S where the induced subfunctions of MULT differ. 

Suppose without loss of generality that l^ D X\ = m (and l^ D Y\ < m). Let 
i G {0, ... , n — 1} be the smallest index such that j/ 8 - G^ S. Let 

s' = {j/o, • • • , yi-i} u ( s n {x , . . . , x n _i_ 8 }J . 

Note that because {y 0} . . . , j/ 8 _i} C S and l^ D X\ = m, we have |S"| > m. 



l hi order for this notion to be strictly correct, "have been read" must be interpreted to mean 
: 'appear on any path from the root" . 
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Let us adopt the following notation for the integers obtained from partial settings 
to the variables. For a setting a to W C XUY (i.e., a : W — >■ {0, 1}), let x a denote the 
integer that is represented in binary when the variables of X D W have the value given 
by a and the variables of X D W are each 0. Define y a similarly. For a single variable 
z (£ W } let "a + z v denote the setting to W U {z} that further sets z = 1. For two 
settings a and r to disjoint subsets W and V, let "a U r" denote the setting equal to 
a on W and to r on V. Finally, let (x)i denote the zth bit in the binary representation 
of integer x, so x = Y^i=o ( x )i %' '■ 

Let a and f3 be two settings to S that differ on some bit in S'. Our goal is thus to 
find a setting r to the bits of S so that (x aUT y aUT ) n _i ^ (xp UT yp UT ) n -i. 

We proceed in two stages, according to Lemmas 2 and 3. First we ensure, by 
setting to f (if necessary) a single variable z of S, that the two products x a+z y a+z and 
X[3 +Z yp +Z differ in a "high-order" bit — a bit position in the range [n — m — 3, n — 1 ] 
(we aren't concerned with higher bit positions). In the second stage, we set to I a pair 
of variables of S, one in X and one in Y, so that the resulting product differs in a 
higher high-order bit position. We iterate this second stage, repeatedly setting a pair 
of variables until the resulting products differ in bit position n — 1. It follows that a 
and f3 induce different subfunctions on S — the subfunctions differ when S has z and 
the pairs from the second stage all set to I and the remaining bits of S set to 0. 

Lemma 2 If for all i £ [n — m — 3,n — I] we have (x a y a )i = (xpyp)i, then there is a 
single variable z £ S such that 

{Xa+zlJa+zji 7= \ x fi+zlj fi+z)i 

for some i £ [n — m — 3, n — 1] . 

Lemma 3 Let T C XUY , and a and f3 be two settings to T . Let d be the greatest index 
in [0, n — 2] such that (x a y a ) d ^ (xpyp) d . Ifd > n — m — 3 and max (\T D X\, \T D Y |) = 
t < 3m, then there are two variables, x u £ X D T and y v £ Y D T , such that 

(x a >y a >) d+1 ^ (xp>yp>) d+1 
where a 1 = a + x u + y v and /3 1 = f3 + x u + y v . 
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Theorem 4 follows from these lemmas as outlined above. Notice that Lemma 3 is 
hrst applied with t < m + 1, and since we must apply Lemma 3 at most m + 3 times, 
each time setting one more variable of X and Y, we maintain t < 2m + 4 < 3m as 
required. ■ 

We now give the proofs of Lemmas 2 and 3. 

Proof of Lemma 2: The settings a and f3 differ on S' C S 1 ; suppose hrst that they 
differ in a bit of S' D X. 



xpyp 



^aVo 



x l3+y k yi3+y k 



X a+y k Ua+y k 




Figure 3.1: The integers modulo 2 n . In order for xp +yk yp +yk and x a+yk y a+yk to fall 
into different segments, we must choose k so that 2 k (x a — x@) has large 
magnitude. 

The proof is most easily explained by picturing the integers modulo 2 n on a circle. 
Partition the circle into 2 m+3 equal-sized segments according to the values of the m + 3 
highest bits, so each segment contains 2 n ~ m ~ 3 consecutive integers, as depicted in 
Figure 3.1. The hypothesis of the lemma is that x a y a and xpyp fall into the same 
segment. If we set bit y^ £ SHY to 1, we obtain the products x a+yk y a+yk = x a y a -\-x a 2 k 
and xp +yk yp +yk = xpyp + xp2 k . The product x a+yk y a+yk is obtained by a translation 
of 2 k x a along the circle from x a y a , and xp +yk yp +yk is obtained by a translation of 
2 k xp from xpyp. If, modulo 2 n , their difference 2 k (x a — xp) is at least 2 n_m ~ 2 , or two 
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segments long, and at most 2 n — 2 n ~ m ~ 2 , or "negative two" segments long, then it 
is clear that the translates x a+yk y a+yk and xp +yk yp +yk fall into different segments. It 
follows that the products x a+yk y a+yk and xp +yk yp +yk differ in a high-order bit position. 

It only remains to show how to choose y^ £ S H Y so that 2 n ~ m ~ 2 < 2 k (x a — xp) < 
2 n — 2 n ~ m ~ 2 modulo 2 n . Let x = x a — xp. It is useful now to think in terms of the 
table generated by the usual grade-school algorithm for multiplying x by j/, as shown 
in Figure 3.2. 





i 




* * 


•••10000000 


= X 




Figure 3.2: The table generated by the grade-school algorithm for multiplying x = 
x a — xp by y. We choose a bit yj, to set to 1 so that the least significant 
1 in x is shifted into a "high-order" bit position. 



In this table, the rows are the partial products, indexed by y 0} . . . } y n -i- The 
diagonals are indexed by x n _i, . . . , ~x . Since a and f3 differ in a bit of S' D X C 
{x 0} . . . , x n _i_ 8 }, the difference ~x = x a — xp must have a I somewhere in the range of 
bit positions [0, n — 1 — i\. Let j be the position of the least significant I in af, so that 
either there is a in position j — 1, or j = 0. We now choose any variable of S D Y 
with index k in the range [(n — I) — j — m, (n — I) — j]. This range must contain a 
variable y^ £ S D Y because if j < n — m — 1, the range has at least m + I elements 
but l^ fl Y | < m; if j > n — m, we may choose A; = i (by dehnition, j/ 8 - £" S 1 ), which lies 
in the range [0, n — I — j] since j < n — i — 1. This ensures that 2 fc x has a I in position 
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j + k and a in position j ' + k — 1, where n — 1 — m<j-\-k<n — 1. It follows that 
modulo 2 n , we have 2 n_m_1 < 2 fc x < 2 n — I — 2 n_m ~ 2 , the upper bound attained if 
all bits except bit j -\- k — 1 are l's and j -\- k = n — 1 — m. This satishes the desired 
bounds. 

If a and f3 differ in a bit of S' f)Y C {j/ , . . . , j/ 8 _i} the proof is essentially the same. 
We have to choose x k G SnI so that 2 n - m - x < 2 k (y a - y fj ) <2 n - 2 n ~ m ~ 2 - 1 modulo 
2 n . In this case, we know y = y a — yp has a I in the range [0, i — I]. Again letting j 
be the least significant I of y in this range, we simply choose k anywhere in the range 
[n — 1 — j, n — 1 — j — m\. Since j < i — 1 < m and n > y/n > 1 -\- j -\- m, this range 
always has m + I elements. It follows as before that 2 k y satishes the desired inequality. 
This completes the proof. ■ 

Lemma 3 Let T C XUY , and a and f3 be two settings to T . Let d be the greatest index 
in [0, n — 2] such that (x a y a ) d ^ (xpyp) d . Ifd > n — m — 3 and max (\T D X\, \T D Y |) = 
t < 3m, then there are two variables, x u G X D T and y v G Y D T , such that 

(x a >y a >) d+1 ^ (xp>yp>) d+1 
where a 1 = a + x u + y v and /3 1 = f3 + x u + y v . 

Proof of Lemma 3: We will consider all pairs of variables (x U} y v ) such that u-\-v = d. 
We want {x a ,y a ,) d+1 ^ (xp>yp>) d+ i, where 

x a ,y a , = (x a + 2") (y a + 2 V ) 

= (x a y a + 2 d ) + (2 v x a + 2 u y a ) , 

and xp>yp> = (xp + 2") {y p + 2 V ) 

= (x p yp + 2 d ) + (2 v x p + 2 u yp) . 

Since d is the highest bit in which x a y a and xpyp differ, clearly \x a y a + 2) ^ 
{x-fiVfi + 2 d ) , . We will choose u and v so that the addition of the "cross terms" 
2 v x a + 2 u y a to x a y a + 2 d does not affect bits d or d -\-l of x„j/„ + 2 d (and similarly for 
/3). In order to do this, we choose u and v so that in each case, the cross terms have 
0's in bit positions d and d + I and furthermore, in the addition of the two integers, 
there is no carry bit into position d. 



52 



A lower bound for multiplication with read-once programs 




Choose u and 
v so these bits 
are all O's. 



* * 1 1 11 ••• = x a y a 
n — 1 d i 

Figure 3.3: In Lemma 3, we choose x u and y v to set to 1 so that u-\-v = d and also so 
that the products 2 u y a and 2 v x a have O's in bit positions d — 1, . . . , i — 1 
so that when added to x a y a + 2 d , they do not cause a carry to propagate 
into position d + 1. 



To accomplish this, we hrst find the largest bit position i less than d where x a y a 
has a (so positions i + 1 through d — 1 are all l's). We will choose u and v so that 
2^x„ and 2 u y a each has O's in positions i — 1 through J + 1. It follows that their sum 
then has O's in positions i through d-\-l, and so, when added to x a y a + 2 which has a 
in position z, causes no carry into any position i + I through J (see Figure 3.3). We 
will choose u and v so that the same conditions hold for f3 as well. 

A simple counting argument now shows that there exist u and v as desired. First, 
we claim that x a y a (and xpyp) has l's in at most t 2 bit positions, so that (d—l) — i < t 2 . 
In general, if the binary representations of integers p and q have u;(p) and u;(^) l's in 
them respectively, then clearly p-\- q has at most w(p) + w(q) l's in it. Recall a sets at 
most t bits in X or y. We may therefore view x a y a as the addition of at most t shifts 
of x n . and the claim follows. 



We require (2 v x c 



(2 v x 



P)j 



in at most t 2 + 4 positions j:j = d-\-l,d,d— 1, 



. . . ,z,z — 1. There are at most t bit positions in which either x a or x^ has a 1, and 
for each such 1, there are at most t 2 + 4 "bad" values of u £ [0, n — 1] that shift the 1 
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to a position we require to be 0. Thus, x a and xp rule out at most tit 2 + 4) values 
of v. Furthermore, there are up to t variables of Y that are in T, making a total of 
tit 2 + 4) + 1 values of v that we may not choose. Similarly, a total of at most tit 2 + 4) + 1 
values of u are ruled out by y a , yp } and T. The number of pairs (x U} y v ) in which either 
x u or y v has been ruled out is thus at most 

27n 15\ 3 /n N 



/27n 
2(t 3 + ht) < 2 (27m 3 + 15m) < 2 I + 



since t < 3m and m = ^/n/4. There are at least d -\- I > n — m — 2 pairs (x„, y v ) such 
that u + v = d. Thus we retain at least 



n - 2 - — n H \/n = 0(n) 

4 V 64 4 / 

good pairs satisfying the desired requirements for x u and y v . For n > 378, this expres- 
sion is greater than 1, implying that there exists a pair as desired. ■ 



3.3 Improving the bound to 2 n (^ 

We can improve the lower bound to 2™v^' by analyzing more closely how we iterate 
Lemma 3 in the proof of the theorem. We begin with the observation that we needed 
m = 0(\/n) because in Lemma 3, we used t 2 = 0(m 2 ) as an upper bound on the 
number of consecutive l's to the right of position d in x a y a or xpyp. We then required 
0's in these 0(m 2 ) positions in the cross terms 2 v x a + 2 u y a and 2 v xp + 2 u yp. Since 
each of the 0(m) l's in x a may then rule out 0(m 2 ) values of v, we needed 0(m 3 ) < n 
in order not to rule out all values of v. In order to allow m = 0(y/n), we will reduce 
to 0(m) the number of positions in which we require 0's in the cross terms. For the 
rest of this section, we let m = \Jnj3. 



For example, if we knew that x a y a and xpyp looked like 2 d , then 

xpyp = • • • 00 • •• 

d 

we would need to require 0's in the cross terms in only three positions: d + 1, d } 
and d — 1. This is sufficient to ensure that the addition of cross term 2 v x a + 2 u y a to 



2 



Here and henceforth, "• • •" denotes an arbitrary string of 0's and l's; thus x a y a = • • • 1 • • • has 

d 



a 1 in bit d, a in bit d — 1, and may have any values in other bit positions. 
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x a y a + 2 d does not generate a carry into position d and does not affect bits d or d + f 
of x a y a + 2 d . The same holds for f3 and we get (x a tya')d+i ^ (xp'yp')d+i- With only 
these three positions required to be O's, the total number of u's ruled out by xp and x a 
is proportional to the number of l's they contain, which is 0(m). Similarly, the cases 

d 



. . . 11 . .. 




X ay a 


d 


and 




... oo ••• 

d 




X/3V/3 



X0fi = • • • • • • X0p = ■ ■ • 1 • • • 

d d 

can be handled with only a few constraints by choosing u + v = d — 1 (this will be 
proved in Lemma 5). In fact, there is really only one case in which we need to require 
(2"x p + 2 u y fj ) or (2 v x a + 2 u y a ) to have many O's: 

Definition 4 Let d be the greatest index less than n in which (x a y a ) d ^ (xpyp) d . We 
say that x a y a and xpyp are k-bad if d > n — m — 4 and the products look like 

Xaya •**1U 

d 

x p yp = •••oiiii mm,--- 

n — m — 6 k 

or vice versa (exchanging a and f3). 

In this case, say xpyp = ■ ■ ■ 01111 111 111/ • • , we must require 2 v xp + 2 u yp to be 

n — m — 6 k 

in the positions of each of these l's in order to prevent a carry into position d + 1 



when we add it to xpyp + 2 . In order to allow m = 0(y/n), we will ensure that the 
products are not &-bad for k > m + 4. Then the number of u's ruled out by each 1 of 
x a and xp is 2m + 10, and as long as the number of l's in x a or xp is 0(ra), the total 
number of u's ruled out is 0(m 2 ). 

We will hrst show that we may begin with products that differ in a high-order bit 
but are not 1-bad, and then prove a version of Lemma 3 in which each application 
allows the "badness" to grow by at most 1. 

Lemma 4 For any two settings a and f3 to S that differ on a bit of S' , there are three 
(or fewer) variables x U} y V} z £ S such that for a 1 = a-\-x u -\-y v -\-z and fi' = fi-\-x u -\-y v -\-z, 
the products x a 'y a ' and xpiypi differ in a high-order bit (in the range [n — m — 4, n — 1]) 
and moreover, are not 1-bad. 



§3.3 Improving the bound to 2 ^ ' 55 

(The comment "or fewer" refers to the fact that we may not need to set some or any 
of these three variables.) 

Lemma 5 Let T C X U Y, and a and f3 be two settings to T. Let d be the great- 
est index in [0,n — 2] such that (x a y a ) d ^ (xpyp) d . Suppose d > n — ra — 4 and 
max(|T fl X\, \T D Y\) = t < 2ra + 5 and also that x a y a and xpyp are not k-bad, for 
some k < ra + 4. Then there are two variables, x U} y v £ T , such that 

{x a .y a .) d+1 ^ {x f5 ,y f5l ) d+1 

for a' = a + x u + y v and /3 1 = f3 + x u + y v > an d moreover, x a iy a i and xpiypi are not 
(k + l)-bad. 

We now have 

Theorem 5 Any read-once branching program for MULT has size 2 Q (^> . 

Proof: The proof is exactly the same as the proof of Theorem 4 except for the lemmas. 
We start with products that differ in a high-order bit but are not 1-bad, as provided by 
Lemma 4. The number of variables in X or Y set in these products is at most m + 2. 
We obtain a difference in bit n — 1 by iterating Lemma 5 at most m + 3 times, each 
time setting at most one variable in X and in Y. This maintains t < (ra + 2) + (ra + 3) 
and k < 1 + (ra + 3) as required. ■ 

We now give the proofs of Lemmas 4 and 5, which we restate for convenience. 

Lemma 4 For any two settings a and f3 to S that differ on a bit of S' , there are three 
(or fewer) variables x U} y V} z £ S such that for a 1 = a-\-x u -\-y v -\-z and fi' = f3-\-x u -\-y v -\-z, 
the products x a iy a i and xpiypi differ in a high-order bit (in the range [n — ra — 4, n — 1]) 
and moreover, are not 1-bad. 

Proof: Either x a y a and xpyp differ (modulo 2 n ) by at least 2 n ~ m ~ 3 or not. If they 
do, then they must differ in a high-order bit (in the range [n — ra — 4, n — 1]). If not, 
we proceed just as in Lemma 2 to find a variable z such that x a+z y a+z and xp +z yp +z 
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differ by at least 2 n ~ m ~ 3 : As in Lemma 2, when a and f3 differ in a bit of S' D A, it is 
sufficient to set to f a variable y^ G S H Y such that 2 k (x a — xp) is at least 2 n_m ~ 2 , or 
two segments long, and at most 2 n — 2 n ~ m ~ 2 } or "negative two" segments long. Since 
x a y a and xpyp differ by less than one segment (2 n_m ~ 3 ), the translates x a+yk y a+yk and 
xp +Vk yp + y k differ by more than 2 n ~ m ~ 3 and must fall into different segments. The rest 
of the proof follows exactly as before. In order to avoid overly cumbersome notation, 
let us abuse it slightly by calling the products x a y a and xpyp } even though they should 
possibly be called x a+z y a+z and xp +z y p+z . 

Now that we know the products differ in a high-order bit, it remains to ensure that 
they are not I-bad. Assume they are. Let d be the greatest index less than n of a bit 
position in which x a y a and xpyp differ. 

First, we claim that if the products are I-bad, then in fact d > n — m — 2. Because 

XaVa • • • 1 U * * * 

if, say d = n — m — 3, then the products look like 3 _ . . . n i i i i . . . and 

d T 

n—m — 6 

therefore they differ modulo 2 n by at most 2 n ~ m ~ 7 + (2 n_m ~ 4 — 1) (since they agree in 
bits d-\-l through n — I), but we know they differ by at least 2 n ~ m ~ 3 . Furthermore, by 
the same reasoning, not only is d > n — m — 2, but x a y a must have a I in some position 
between d — 2 and n — m — 4 inclusive (note that (x a y a )d-i = 0; else the products are 

x a y a = •••lOOOOO--- 

not I-bad). For otherwise, the products look like n 1 1 1 1 1 1 1 1 and 

; ^ x fj y fj = --- 011111111--- 

d T 

n — m — 6 

thus they differ modulo 2 n by at most 2 n ~ m ~ 7 + 2 n ~ m ~ 4 — 1, a contradiction. 

So we are reduced to the case that the products are 1-bad, differ in position d > 
n — m — 2, and x a y a has a 1 in some position between d — 2 and n — m — 4. Let £ be 
the highest index of a 1 in this range: x a y a = ••• 10001---. We will find a pair of 

d i 

variables (x U} y v ) with u-\- v = n — m — 6 so that the cross terms 2 v x a} 2 v xp } 2 u y a} 2 u yp 
all have 0's in positions n — m — 8 through n — 1. Then (2 U+V + 2 v x a + 2 u y a ) and 
(2 U+V + 2 v x fj + 2 u y fj ) both look like • • • 1 • • • . We see that x a ,y a , looks 

T T 

n — 1 n — m — 6 

like either ••• 10001--- if there is no carry into position £ when 2 U+V + 2 v x a + 2 u y a 

d i 



^Without loss of generality, let us assume that in position d, x a y a has a 1 and xpyp has a 0. 
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is added to x a y a} or • • • 1 1 • • • if there is a carry into position £. Meanwhile, 

d i 

■■■011111111--- 

d 

xp,yp, = + •••000000010 ••• lookg like ...ioOOOOOO--- or-.-lOOOOOOl--- 

d T d T 

n — m — 6 n — m — 6 

depending on whether there is a carry into position n — m — 6 in this addition. 



Since xpiypi has 0's in positions £ < d — 2 and £ — 1 > n — m — 5, we see that 

x„,j/„, = •••10000 I--- 

Xa'Va 1 and xpiypi look like d d where J' is either £ or £ -\- 1. 

xp,yp, = •••lOOOOOO--- 

d d' 

Furthermore, the products agree in all higher bits up to n — 1 because by the definition 
of d, x a y a and xpyp agree in bits d + 1 through n — 1 and we chose x u and y v so that 
the cross terms have 0's in these positions. Since £ > n — m — 4, it follows that x a iy a i 
and xpiypi differ in a high-order bit and are not even 1-bad. 

A counting argument like that for Lemma 3 shows that we may choose x u and y v 
as needed. We require the cross terms to have 0's in at most m + 8 positions. Since 
at most m + 1 bits are set to 1 in x a or x@, the total number of values v that we may 
not choose is (ra + l)(m + 8) + (ra + 1). The same number of values u are ruled out, 
making a total of at most 2(m + l)(m + 9) = 2^ + 0(y/n) pairs (x U} y v ) that are ruled 
out. Since there are n — m — 5 pairs to choose from initially, we retain 0(n) pairs. ■ 

Lemma 5 Let T C X U Y, and a and f3 be two settings to T. Let d be the great- 
est index in [0,n — 2] such that (x a y a ) d ^ (xpyp) d . Suppose d > n — m — 4 and 
max(|T fl X\, \T D Y\) = t < 2m + 5 and also that x a y a and xpyp are not k-bad, for 
some k < m + 4. Then there are two variables, x U} y v £ T , such that 



(x a >y a i) d+1 ^ {xpiypi 



U+i 



for a' = a + x u + y v and f3' = f3 + x u + j/„ ; ond moreover, x a iy a i and xpiypi are not 
(k + l)-bad. 

Proof: We have four possible cases (up to switching a and f3): 
x a y a = (1):---10--- (2): ... 11 ... (3): ... 11 ... or (4): • • • 1 • • • 

d d d d 

x pyp = . . . o o • • • . . . o o • • • . . . o i • • • ■■■ oiiino--- 

d d d d 
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By assumption, d > n — m — 4. 
Case 1: x a y a = ••• 10 ••• 

d 

xpyp = • • • 00 • • • 

d 

It is sufficient to choose (x U} y v ) so that u + v = d and each of the cross terms 
2 v xp } 2 u yp } 2 v x a} and 2 u y a has 0's in positions d — 3 through d + 1. Then the sums 
2 v xp + 2 u yp and 2 v x a + 2 u y a have 0's in positions d — 2 through d + 1. Adding these 
to x a y a and a^j/^ respectively therefore causes no carry into position d and thus the 
addition of 2 U+V = 2 d causes a carry into bit d-\-l for a but not for f3. Since x a y a and 
xpyp agree in bits d + 1 through n — 1, this carry bit causes them to differ in bit d + 1 
and possibly higher bits as well. 

We now verify that x a iy a i and xpiypi are not 1-bad. We know that 2 U+V -\-2 v xp-\-2 u yp 

...oo--- 

looks like • • • 1 • • • . Thus xptypt = looks like either • • • 1 • • • or 

d d 

• • • 1 1 • • • , depending on whether there is a carry into position d— 1. Thus xpiypi does 

d 

not have a string of l's extending past position d— 1 > n — m — 5 and cannot make the 
products even 1-bad. Since the products differ in position d + 1 or higher and x a iy a i 
has a in position d } the products cannot be 1-bad due to a string of l's in x a iy a i. 

To see that we can choose (x U} y v ) as desired, we argue as in the proof of Lemma 3. 
The number of positions required to be is 5, ruling out 5t values of v. Of the 
d + 1 = n — O(y^n) pairs (x U} y v ) such that u + v = d } the number of pairs ruled out is 
at most 2(5t + t) = I2t < 12(2m + 5) = 0(y/n), so there are 0(n) remaining pairs to 
choose from. 



Cases 2: 


Xaya 


... 11 

d 




X/3V/3 = 


■■■ 00 

d 


and 3: 


^aVa 


... 11 

d 




X/3V/3 = 


■■■ 01 

d 



It is sufficient to choose (x u ,y v ) as in Case 1 except that u + v = d — 1. Adding 2 d ~ 1 
will cause a carry to propagate into position d-\-l for a but not for /3, causing them to 
differ in bit d-\-l and possibly higher bits as well. The counting argument for choosing 
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{xuiljv) is exactly the same as in Case 1 except that there is one fewer pair (x U} y v ) 
with u + v = d — 1. 

It only remains to show that in fact x a iy a i and xpiypi are not 1-bad. Now 2 U+V + 
2 v x a + 2 u y a looks like • • • 1 • • • and so does 2 U+V + 2 v x fj + 2 u y fj . Thus x a ,y a , = 



d 



11 

d 



+ •••0010--- and we see that it has a in bit d, 

...oo--- 

d 

Looking now at xpiypi, we see that in Case 2, xptypt = looks like 

either • • • 1 • • • or • • • 1 • • • , depending on whether there is a carry into position d — 

...oi... 

d 

1. In Case 3, xptypt = looks like either • • • 1 • • • or • • • 1 1 • • • , 

d d 

depending on whether there is a carry into position d — 1. In any case, xp/ypi does not 
have a string of l's extending past d — 2 > n — m — 6, and so x a iy a i and xpiypi are not 
even 1-bad. 

Case 4: x a y a = • • • 1 • • • 

d 

k-1 

xpyp = •••OllllfuTo--- 

d T 

n — m — 6 

Without loss of generality, let us say that xpyp contains the maximum number, k — 1, of 
consecutive l's extending past position n — m — 6. We choose (x U} y v ) so that u + v = d 
and the cross terms 2 v x a} 2 u y a} 2 v xp and 2 u yp have 0's in positions (n — m — 6) — k — 2 
through n — 1. This will ensure that from 2 d we get a carry into position d + 1 for a 1 
but not for /3', causing the products to differ in bit d + 1 and possibly higher bits as 
well. 

The sum 2 v xp + 2 u yp has 0's in positions (n — m — 6) — k — 1 through n — 1, 

fc-i 



•••0111111110--- 



d 



k-i 



+ •••oioooooooooo--- ! ! n .,, innrrrTn 

so xpiypi = looks like either •••111111111U--- or 

d 
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fc-i 
••• 111111111 10---, depending on whether there is a carry into position (n — m — 

d 

6) — k. So xpiypi has at most k l's extending past position n — m — 6. The pair of 
products cannot be worse than k-had because of a longer string of l's in x a iy a i because 
the products differ in position d + 1 or higher and x a iy a i has a in position d. Thus 
Za'Ha' and xp/ypi are at worst k-had. 

The number of positions in which we require 2 v x a or 2 u xp to be is m + 6 + £; + 2 < 
2m + 12. Together, x a and xp may rule out t(2m + 12) values v in addition to the t 
variables y v already in T. Taking into account the same number of values u ruled out 
by y a and yp } there are at most 2{t(2m + 12) + t)) pairs (x U} y v ) that could be ruled 
out. Of the d-\- 1 = n — 0(y / n) possible pairs (x U} y v ) with u + v = d } a total of at most 



Ti 

2(2m + 5)(2m + 13) = 8- + 0(y/n) 

9 



pairs are ruled out, leaving ^ — 0(y/n) = ft(n) pairs to choose from. For n > 56,000, 
we can say there is at least one pair left. ■ 

For preciseness, we have given explicit values of n above which our proofs hold; these 
numbers are most likely a reflection of our proofs rather than the true complexity, and 
should not be taken very seriously. 
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3.4 Problem reductions 

We may deduce similar lower bounds for other boolean functions by the standard 
technique of problem reduction. In order to preserve read-once complexity, we will 
consider a very restrictive type of problem reduction. We begin with the notion of 
projection reductions [SV81], as defined in [CSV84]: 

Definition 5 A function f = {f n } ne ^ is projection reducible to a function g = {g n } ne ^, 
written f < P roj Q, if there is a mapping 

° '■ {VU- ■ ■ ,y P (n)} ->■ {0, l,Xi,.. . ,X n ,X^,. . . } X^} 

such that 

f n {x u ... ,x n )= flf p („)(<r(j/i), . . . , o-(y p{n) )) 

for some function p(n) bounded above by a polynomial in n. 

In other words, f< pTO jg if one can use as a black box an algorithm (circuit, branching 
program) for g(y\ } . . . , y v ( n )) simply by substituting the inputs to / for the inputs to g 
and then taking the output of the algorithm as the output for /. These reductions were 
used by Chandra, Stockmeyer, and Vishkin [CSV84] in their study of constant-depth 
reducibility — clearly, given that / < P roj Q-, if Q £ AC then / £ AC . 

We would like a reduction <' that allows us to deduce that if / <' g and g £ READ-1 
then / £ READ-1. It is easy to see that projection reductions satisfy this condition if 
the mapping a is injective with respect to the x variables: 

Definition 6 A function f is read-once reducible to a function g , denoted f < r _ g , if 
there is a projection reduction a from f to g in which for i ^ j , 



<r(yi) ^ <r(yj) and <r(yi) ^ <r(yj)- 

It follows that a read-once branching program for f(xi } ... } x n ) is obtained by rela- 
belling the nodes of a read-once program for g(y\ } . . . , y n ). 
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3.4.1 Reductions to other arithmetic functions 

Projection reductions have also been used to deduce tight lower bounds on the depth of 
polynomial-size threshold circuits. It was originally proved in [HMPST93] that INNER- 
PRODUCT-MODULO-2 cannot be computed in polynomial-size by threshold circuits 
of depth 2. It was also noted there that the projection reduction to multiplication (first 
given in [FSS84], from PARITY to MULT) shows that MULT obeys the same lower 
bound. 

Wegener [We93] gives projection reductions from MULT to squaring and inversion 
in order to show that these functions also require depth 3 polynomial-size threshold 
circuits. The lower bound for the middle bit of multiplication implies a lower bound 
for the appropriate bit of these two functions. We phrase the reductions in [We93] in 
terms of the following Boolean functions: 

• SQUARING : {0,1}™ — >■ {0,1}; computes "the" middle bit (here, bit n rather 
than bit n — 1 which we chose for MULT) in the square of an n-bit integer: 

Squaring^) = (z 2 ) n . 

• INVERSION : {0,1}™ — > {0,1}; computes the ones' bit in the reciprocal of an 
n-bit number between and 1: 

Inversion^) = y 

where x represents the number . X\X 2 • • • x n = ^- x{2r l and y = y n ■ ■ ■ y is the 
integral part of 1/x. (Note that I < y < 2 n .) Define the function to be if all X{ 
are 0. 

Wegener actually shows that 

Mult < proj Squaring < proj Inversion, 

except that the reductions are given for all bits of multiplication, squaring, and in- 
version. Though it is not noted there, we shall see that each reduction is actually 
a read-once reduction. The polynomial p(n) of the reduction is linear in both cases, 
implying that if each bit of the function is computable with a read-once program of 
size f(n), then MULT is computable with a read-once program of size f(cn) for some 
constant c. This gives the following corollaries to Theorem 5: 
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Corollary 1 Any read-once branching program for computing the function SQUARING 
has size at least 2 n (^ . 

Proof: We verify that the reduction in [We93] shows MULT < r _ SQUARING with a 
polynomial p(n) = 3n + 2. In addition to verifying p(n), we must also check that the 
reduction is indeed between these two Boolean functions and also that the mapping a is 
injective. The reduction simply maps the n-bit inputs x, y (of MULT) to the (3n + 2)- 
bit input z = x2 2{n+1 '> + y (of SQUARING), so that z 2 = x 2 2 4{n+1 '> + xy2 2{n+1 '> +1 + y 2 . 
The middle bit of the product xy is found in the middle bit of z 2 : (xy) n -\ = (z 2 ) 3n+2 . 
Thus p(n) = 3n + 2. It is clear that the mapping a is injective since 

yi if < i < n; 

<r{zi) = { if n < % < 2(n + 1); 

Xi_ 2 (n+i) ^ 2(n + 1) < i < 2(n + 1) + n. 



Corollary 2 Any read-once branching program for computing the function INVERSION 

has size at least 2 n (^ . 

Proof: We verify that the reduction in [We93] shows SQUARING < r _ INVERSION 
with polynomial p(n) = 17 n + 1. 



The reduction SQUARING< pro jlNVERSION reduces the problem of computing the 
of an n-bit integer m to th 
where 



square of an n-bit integer m to the problem of computing 1/(1 — x) = l-\-x-\-x 2 -\-x 3 -\- 



1 — x = 1 — m 2 



-4n r> — lOn 



which is a lOn-bit number slightly less than 1. The proof in [We93] shows that the 
product m 2 lies in bit positions — 6n — 1 through — 8n in 1/(1 — x), its middle bit being 
in position —7n. By instead computing the inverse of 2~ 7n (l — x), a 17n-bit number, 
we find the middle bit of m 2 in position 0. 

For example, working in decimal, we may compute 5 2 (so n = 1) by letting 1 — x = 
1 — 5 • 10~ 4 — 10~ 10 and calculating 

(l — 5 - 10 -4 — 10 -10 )" 1 = 1.000500250225- •• 
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from which we may recover 5 2 = 25 in positions —7 and —8. By instead calculating 
(10~ 7 • (1 — 5 • 10~ 4 — 10~ 10 )) , we may find the middle digit, 2, of 25 in position 0. 

To see that the mapping a is injective, simply notice that 1 — x = 1 — ra 2~ 4n — 2~ 10n 
has f's in all positions — f through — lOra, except in positions — 3n — f through — 4n 
where it has exactly the complements of the bits of m. The number 2~ 7n (f — x) is 
similar, with extra 0's on the left. ■ 



Chapter 4 



Discussion and further work 



In this thesis, we have proved that integer multiplication requires exponential-size 
read-once branching programs. This fact is important for the hardware verification 
community, which would like to find a simple model in which multiplication can be 
computed with polynomial size. It was known already that most oblivious branching 
programs, which are good candidates because of the ease with which they are manip- 
ulated, require exponential size to compute multiplication. 

In the course of understanding the relevant lower bounds and related models, we 
have also assembled a survey of the structure of these low-level complexity classes, 
and also of the main ideas that have been brought to bear in thinking about their 
computation. This survey also includes a few simple proofs that have not yet appeared 
in the literature. 



Further work 

There are many open questions surrounding the topics of this thesis, some of which 
have already been mentioned. We will describe some of these problems that we consider 
to be the most important, interesting, or tractable. The oldest of these problems, 
open since [FHS78], is 

65 
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Open Question 1 Is there a deterministic polynomial-time algorithm for determining 
the equivalence of two read-once programs? 

The answer to this question loses some practical significance in light of the lower bound 
for multiplication and the intractability of the synthesis operations, which make read- 
once programs less attractive as an alternative to OBDD's in hardware verification. 

Multiplication 

Although perhaps not the most interesting question, there is the possibility of improv- 
ing the lower bound for multiplication. We doubt that 2 @ (v™> is the true read-once 
complexity of MULT (recall that Bryant's lower bound for OBDD's is 2 n ' 8 ), but the 
simple counting technique used in our proof seems limited to this lower bound. It is 
curious that many of the lower bounds for read-once programs achieve only 2™v^' if 
n is the number of input bits — only the lower bound of [BHST87] achieves a fully ex- 
ponential lower bound of 2 Q ( n > . This limitation is most likely an artifact of the proofs, 
but it is not well understood. 

In addition to improving the bound, it may also be possible to extend the argument 
to show that a similar bound holds for nondeterministic read-once programs or for read- 
&-times programs. 

Open Question 2 Does MULT require superpolynomial nondeterministic read-once 
programs? . . . superpolynomial read-k -times programs? 

For nondeterministic read-once programs, we may define frontier edges as before. Now, 
however, it is not necessary for the inputs reaching an edge to induce the same sub- 
function on the remaining input variables, since inputs may follow several different 
paths. We can say, however, that the inputs in MULT" (I) that pass through a fron- 
tier edge are described by a function fi(Xi 7 Yi) A f 2^X2^2) where X\ U Y\ is in the 
boundary of the filter T and X 2 U Y 2 = (X U Y) \ (X x U Y-y). Thus MULT can be 
written as the conjunction, over all frontier edges, of such functions. We would like to 
show that since each of these functions must reject all of MULT" (0), it can accept 
only an exponentially small fraction of MULT" (I). 
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That is, we would like to show that given MULT(ud) = 1 for all u £ f 1 1 (1) and 
v £ / 2 -1 (l), it must be that l/f^l) x / 2 -1 (l)| < 2~ n " ■ |MULT _1 (1)| for some k > 0. 
(Here, u is a setting to X\ UYi and v is a setting to X 2 Ul^.) For comparison, the proof 
of our lower bound (Theorem 5) in effect shows that given MULT(ud) = MULT(u'u) 
for all u,u' £ / 1 ~ 1 (f) and for all inputs v, it must be that |/ 1 ~ 1 (1)| • 2' v ' is a fraction 
2~fi(V") f the total number of inputs, 2 2n . 

Finally, we mention that there seem to be no nontrivial upper bounds for MULT in 
either nondeterministic or randomized read-&-times models, for k = o(n). Of course, 
in all other models considered in this thesis — OBDD's, £;-OBDD's, &-IBDD's, indeed 
any linear-length oblivious programs, even nondeterministic, as well as non-oblivious 
read-once programs — it is known that exponential size is required. 

The read-&-times hierarchy 

As mentioned in Section 2.4. f, it is not known whether the read-&-times hierarchy is 
strict: 

Open Question 3 For some k > 2, is there a function computable by polynomial- 
size read-k-times programs but not computable by polynomial-size read-(k — 1) -times 
programs? 

In [SS93], it is conjectured that such a function is the problem of determining 
whether a ^-dimensional hypergraph on n nodes is r-regular for, say, r = n/2. (Re- 
call that [SS93] proves that this problem on ordinary graphs (k = 2), while easily 
computed by read-2-times programs, requires read-once programs of size 2™ n '.) The 
function 7T-MATRIX may be regarded as a special case of this problem: it is the case 
of determining whether a bipartite n X n graph is I-regular. We believe that higher 
dimensional versions of this latter problem should separate the read-&-times hierarchy. 

For example, consider the 3-dimensional version, "7T-CUBE" , defined on an n X n X n 
cube of boolean variables, which has the value I exactly when each of the n planes 
in each of the 3 dimensions contains exactly one 1. 7T-CUBE is easily computed with 
read-3-times programs. Here is a possible strategy for showing it is not computable 
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with polynomial-size read-2-times programs. According to Theorem 1 in [BRS93], a 
read-2-times program for 7T-CUBE enables us to express the function as 

poly(BPsize) 

tt-CUBE = \/ MX*) A f i2 (X i2 ) A f i3 (X i3 ) A f i4 (X i4 ) 

8 = 1 

where each Xij is a subset of half the n 3 variables and each variable appears in at most 
two of Xii 7 Xi2 } Xi 3} Xi4 for each i. We would like to show that a function of the form 
fii(Xn) A fi2(Xi 2 ) A f t3 (X t3 ) A f t4 (X t4 ), which rejects all of 7r-CUBE _1 (0), can accept 
only an exponentially small fraction of 7T-CUBE - (1). 

Since each variable is in two of the X 8 -, one of the three partitions Xn UX 8 - 2 | X 8 ' 3 UX 8 ' 4 , 
Xn U Xi 3 | Xi 2 U Xi4 and Xn U X 8 - 4 | X{ 2 U X{ 3 contains that variable on only one side 
of the partition ("fails to split" that variable). It follows that one of these partitions 
fails to split at least 1/3 of the variables. From this, we may argue further that for one 
of these partitions, there are at least 1/6 of the variables, S, that appear only on one 
side of the partition and at least 1/6 of the variables, T, that appear only on the other 
side. Thus, we may write (if the best partition is X\ U X 2 | X 3 U X 4 ) 

MX tl ) A f l2 (X l2 ) A f l3 (X l3 ) A f l4 (X l4 ) = tl U^ 2 )A/f(^U4) 

= MX\S)AmX\T). 

Since S and T each has more than 1/8 of all the variables, there must be many coplanar 
pairs (s,t) £ S xT. This function cannot accept two inputs x and y that have 5 = 1 



and t = 1 respectively if x and y agree on the variables S U T, since then it would 
also accept the input (which should be rejected) that looks like x on S and like y 
on T. Furthermore, the fraction of inputs in 7T-CUBE - (1) that have all O's in a given 
- X - X - subcube is exponentially small in n, for c constant. It should be possible to 
combine these facts to obtain the desired lower bound. 

Read-once reductions 

Read-once reductions appear to be rather limited in their utility. It is not clear, for 
example, how to use them even to show that directed 5, t-connectivity does not have 
polynomial-size read-once programs. (This function, being NL-complete, is not known 
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to have polynomial-size branching programs at all, regardless of restrictions on reading 
variables.) We may construct a branching program of size 0(n 3 ) for MULT in which 
there is a s, t-path if and only if MULT is 1, but since many edges are labelled with the 
same MULT variable, a computation that reads each edge variable once in fact reads 
the variables of MULT many times. In other words, this is a projection reduction in 
which the variable mapping does not have the necessary injectivity property. 

The Fourier spectrum 

It is an interesting question whether there is any correlation between the Fourier spec- 
trum of a function and the size of its OBDD's. 

Open Question 4 Is there a nice correlation between some property of a functions 
Fourier spectrum and the size of its OBDD 's? 

In particular, it would be useful to know which coefficients are the largest, as this is 
the information that is used in the remarkable algorithms for learning functions with 
shallow decision trees or small constant-depth circuits. As explained in Section 2.7.2, 
the correlations found between such functions and the properties of their spectrums do 
not hold for OBDD's. 

The ordering problem for OBDD's 

One of the most useful research directions, as far as the hardware verification com- 
munity is concerned, is further analysis of the variable ordering problem described in 
Section 2.7.1. Now that it is known to be NP-complete, approximation algorithms — or 
results demonstrating the hardness of approximability — are of most interest. 

Open Question 5 Is there a reasonable algorithm (in P, RP, or BPP) which, given 
an OBDD ; finds another OBDD (possibly obeying a different ordering of the variables) 
with size that is within a bounded factor of optimal? 

Randomized algorithms for this problem should also be considered. 
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